In [2]:
%matplotlib inline
import scipy.optimize
import matplotlib.pyplot as plt

Attribution: These examples are taken from the Scipy Tutorial

The scipy.optimize package provides several commonly used optimization algorithms. A detailed listing can be found by:


In [3]:
help(scipy.optimize)


Help on package scipy.optimize in scipy:

NAME
    scipy.optimize

FILE
    /Volumes/nobackup/Research/anaconda2/envs/nextai/lib/python2.7/site-packages/scipy/optimize/__init__.py

DESCRIPTION
    =====================================================
    Optimization and root finding (:mod:`scipy.optimize`)
    =====================================================
    
    .. currentmodule:: scipy.optimize
    
    Optimization
    ============
    
    Local Optimization
    ------------------
    
    .. autosummary::
       :toctree: generated/
    
       minimize - Unified interface for minimizers of multivariate functions
       minimize_scalar - Unified interface for minimizers of univariate functions
       OptimizeResult - The optimization result returned by some optimizers
       OptimizeWarning - The optimization encountered problems
    
    The `minimize` function supports the following methods:
    
    .. toctree::
    
       optimize.minimize-neldermead
       optimize.minimize-powell
       optimize.minimize-cg
       optimize.minimize-bfgs
       optimize.minimize-newtoncg
       optimize.minimize-lbfgsb
       optimize.minimize-tnc
       optimize.minimize-cobyla
       optimize.minimize-slsqp
       optimize.minimize-dogleg
       optimize.minimize-trustncg
    
    The `minimize_scalar` function supports the following methods:
    
    .. toctree::
    
       optimize.minimize_scalar-brent
       optimize.minimize_scalar-bounded
       optimize.minimize_scalar-golden
    
    The specific optimization method interfaces below in this subsection are
    not recommended for use in new scripts; all of these methods are accessible
    via a newer, more consistent interface provided by the functions above.
    
    General-purpose multivariate methods:
    
    .. autosummary::
       :toctree: generated/
    
       fmin - Nelder-Mead Simplex algorithm
       fmin_powell - Powell's (modified) level set method
       fmin_cg - Non-linear (Polak-Ribiere) conjugate gradient algorithm
       fmin_bfgs - Quasi-Newton method (Broydon-Fletcher-Goldfarb-Shanno)
       fmin_ncg - Line-search Newton Conjugate Gradient
    
    Constrained multivariate methods:
    
    .. autosummary::
       :toctree: generated/
    
       fmin_l_bfgs_b - Zhu, Byrd, and Nocedal's constrained optimizer
       fmin_tnc - Truncated Newton code
       fmin_cobyla - Constrained optimization by linear approximation
       fmin_slsqp - Minimization using sequential least-squares programming
       differential_evolution - stochastic minimization using differential evolution
    
    Univariate (scalar) minimization methods:
    
    .. autosummary::
       :toctree: generated/
    
       fminbound - Bounded minimization of a scalar function
       brent - 1-D function minimization using Brent method
       golden - 1-D function minimization using Golden Section method
    
    Equation (Local) Minimizers
    ---------------------------
    
    .. autosummary::
       :toctree: generated/
       
       leastsq - Minimize the sum of squares of M equations in N unknowns
       least_squares - Feature-rich least-squares minimization.
       nnls - Linear least-squares problem with non-negativity constraint
       lsq_linear - Linear least-squares problem with bound constraints
    
    Global Optimization
    -------------------
    
    .. autosummary::
       :toctree: generated/
    
       basinhopping - Basinhopping stochastic optimizer
       brute - Brute force searching optimizer
       differential_evolution - stochastic minimization using differential evolution
    
    Rosenbrock function
    -------------------
    
    .. autosummary::
       :toctree: generated/
    
       rosen - The Rosenbrock function.
       rosen_der - The derivative of the Rosenbrock function.
       rosen_hess - The Hessian matrix of the Rosenbrock function.
       rosen_hess_prod - Product of the Rosenbrock Hessian with a vector.
    
    Fitting
    =======
    
    .. autosummary::
       :toctree: generated/
    
       curve_fit -- Fit curve to a set of points
    
    Root finding
    ============
    
    Scalar functions
    ----------------
    .. autosummary::
       :toctree: generated/
    
       brentq - quadratic interpolation Brent method
       brenth - Brent method, modified by Harris with hyperbolic extrapolation
       ridder - Ridder's method
       bisect - Bisection method
       newton - Secant method or Newton's method
    
    Fixed point finding:
    
    .. autosummary::
       :toctree: generated/
    
       fixed_point - Single-variable fixed-point solver
    
    Multidimensional
    ----------------
    
    General nonlinear solvers:
    
    .. autosummary::
       :toctree: generated/
    
       root - Unified interface for nonlinear solvers of multivariate functions
       fsolve - Non-linear multi-variable equation solver
       broyden1 - Broyden's first method
       broyden2 - Broyden's second method
    
    The `root` function supports the following methods:
    
    .. toctree::
    
       optimize.root-hybr
       optimize.root-lm
       optimize.root-broyden1
       optimize.root-broyden2
       optimize.root-anderson
       optimize.root-linearmixing
       optimize.root-diagbroyden
       optimize.root-excitingmixing
       optimize.root-krylov
       optimize.root-dfsane
    
    Large-scale nonlinear solvers:
    
    .. autosummary::
       :toctree: generated/
    
       newton_krylov
       anderson
    
    Simple iterations:
    
    .. autosummary::
       :toctree: generated/
    
       excitingmixing
       linearmixing
       diagbroyden
    
    :mod:`Additional information on the nonlinear solvers <scipy.optimize.nonlin>`
    
    Linear Programming
    ==================
    
    Simplex Algorithm:
    
    .. autosummary::
       :toctree: generated/
    
       linprog -- Linear programming using the simplex algorithm
       linprog_verbose_callback -- Sample callback function for linprog
    
    The `linprog` function supports the following methods:
    
    .. toctree::
    
       optimize.linprog-simplex
    
    Assignment problems:
    
    .. autosummary::
       :toctree: generated/
    
       linear_sum_assignment -- Solves the linear-sum assignment problem
    
    
    Utilities
    =========
    
    .. autosummary::
       :toctree: generated/
    
       approx_fprime - Approximate the gradient of a scalar function
       bracket - Bracket a minimum, given two starting points
       check_grad - Check the supplied derivative using finite differences
       line_search - Return a step that satisfies the strong Wolfe conditions
    
       show_options - Show specific options optimization solvers
       LbfgsInvHessProduct - Linear operator for L-BFGS approximate inverse Hessian

PACKAGE CONTENTS
    _basinhopping
    _cobyla
    _differentialevolution
    _group_columns
    _hungarian
    _lbfgsb
    _linprog
    _lsq (package)
    _minimize
    _minpack
    _nnls
    _numdiff
    _root
    _slsqp
    _spectral
    _trustregion
    _trustregion_dogleg
    _trustregion_ncg
    _tstutils
    _zeros
    cobyla
    lbfgsb
    linesearch
    minpack
    minpack2
    moduleTNC
    nnls
    nonlin
    optimize
    setup
    slsqp
    tnc
    zeros

CLASSES
    __builtin__.dict(__builtin__.object)
        scipy.optimize.optimize.OptimizeResult
    exceptions.UserWarning(exceptions.Warning)
        scipy.optimize.optimize.OptimizeWarning
    scipy.sparse.linalg.interface.LinearOperator(__builtin__.object)
        scipy.optimize.lbfgsb.LbfgsInvHessProduct
    
    class LbfgsInvHessProduct(scipy.sparse.linalg.interface.LinearOperator)
     |  Linear operator for the L-BFGS approximate inverse Hessian.
     |  
     |  This operator computes the product of a vector with the approximate inverse
     |  of the Hessian of the objective function, using the L-BFGS limited
     |  memory approximation to the inverse Hessian, accumulated during the
     |  optimization.
     |  
     |  Objects of this class implement the ``scipy.sparse.linalg.LinearOperator``
     |  interface.
     |  
     |  Parameters
     |  ----------
     |  sk : array_like, shape=(n_corr, n)
     |      Array of `n_corr` most recent updates to the solution vector.
     |      (See [1]).
     |  yk : array_like, shape=(n_corr, n)
     |      Array of `n_corr` most recent updates to the gradient. (See [1]).
     |  
     |  References
     |  ----------
     |  .. [1] Nocedal, Jorge. "Updating quasi-Newton matrices with limited
     |     storage." Mathematics of computation 35.151 (1980): 773-782.
     |  
     |  Method resolution order:
     |      LbfgsInvHessProduct
     |      scipy.sparse.linalg.interface.LinearOperator
     |      __builtin__.object
     |  
     |  Methods defined here:
     |  
     |  __init__(self, sk, yk)
     |      Construct the operator.
     |  
     |  todense(self)
     |      Return a dense array representation of this operator.
     |      
     |      Returns
     |      -------
     |      arr : ndarray, shape=(n, n)
     |          An array with the same shape and containing
     |          the same data represented by this `LinearOperator`.
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from scipy.sparse.linalg.interface.LinearOperator:
     |  
     |  __add__(self, x)
     |  
     |  __call__(self, x)
     |  
     |  __matmul__(self, other)
     |  
     |  __mul__(self, x)
     |  
     |  __neg__(self)
     |  
     |  __pow__(self, p)
     |  
     |  __repr__(self)
     |  
     |  __rmatmul__(self, other)
     |  
     |  __rmul__(self, x)
     |  
     |  __sub__(self, x)
     |  
     |  adjoint(self)
     |      Hermitian adjoint.
     |      
     |      Returns the Hermitian adjoint of self, aka the Hermitian
     |      conjugate or Hermitian transpose. For a complex matrix, the
     |      Hermitian adjoint is equal to the conjugate transpose.
     |      
     |      Can be abbreviated self.H instead of self.adjoint().
     |      
     |      Returns
     |      -------
     |      A_H : LinearOperator
     |          Hermitian adjoint of self.
     |  
     |  dot(self, x)
     |      Matrix-matrix or matrix-vector multiplication.
     |      
     |      Parameters
     |      ----------
     |      x : array_like
     |          1-d or 2-d array, representing a vector or matrix.
     |      
     |      Returns
     |      -------
     |      Ax : array
     |          1-d or 2-d array (depending on the shape of x) that represents
     |          the result of applying this linear operator on x.
     |  
     |  matmat(self, X)
     |      Matrix-matrix multiplication.
     |      
     |      Performs the operation y=A*X where A is an MxN linear
     |      operator and X dense N*K matrix or ndarray.
     |      
     |      Parameters
     |      ----------
     |      X : {matrix, ndarray}
     |          An array with shape (N,K).
     |      
     |      Returns
     |      -------
     |      Y : {matrix, ndarray}
     |          A matrix or ndarray with shape (M,K) depending on
     |          the type of the X argument.
     |      
     |      Notes
     |      -----
     |      This matmat wraps any user-specified matmat routine or overridden
     |      _matmat method to ensure that y has the correct type.
     |  
     |  matvec(self, x)
     |      Matrix-vector multiplication.
     |      
     |      Performs the operation y=A*x where A is an MxN linear
     |      operator and x is a column vector or 1-d array.
     |      
     |      Parameters
     |      ----------
     |      x : {matrix, ndarray}
     |          An array with shape (N,) or (N,1).
     |      
     |      Returns
     |      -------
     |      y : {matrix, ndarray}
     |          A matrix or ndarray with shape (M,) or (M,1) depending
     |          on the type and shape of the x argument.
     |      
     |      Notes
     |      -----
     |      This matvec wraps the user-specified matvec routine or overridden
     |      _matvec method to ensure that y has the correct shape and type.
     |  
     |  rmatvec(self, x)
     |      Adjoint matrix-vector multiplication.
     |      
     |      Performs the operation y = A^H * x where A is an MxN linear
     |      operator and x is a column vector or 1-d array.
     |      
     |      Parameters
     |      ----------
     |      x : {matrix, ndarray}
     |          An array with shape (M,) or (M,1).
     |      
     |      Returns
     |      -------
     |      y : {matrix, ndarray}
     |          A matrix or ndarray with shape (N,) or (N,1) depending
     |          on the type and shape of the x argument.
     |      
     |      Notes
     |      -----
     |      This rmatvec wraps the user-specified rmatvec routine or overridden
     |      _rmatvec method to ensure that y has the correct shape and type.
     |  
     |  transpose(self)
     |      Transpose this linear operator.
     |      
     |      Returns a LinearOperator that represents the transpose of this one.
     |      Can be abbreviated self.T instead of self.transpose().
     |  
     |  ----------------------------------------------------------------------
     |  Static methods inherited from scipy.sparse.linalg.interface.LinearOperator:
     |  
     |  __new__(cls, *args, **kwargs)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors inherited from scipy.sparse.linalg.interface.LinearOperator:
     |  
     |  H
     |      Hermitian adjoint.
     |      
     |      Returns the Hermitian adjoint of self, aka the Hermitian
     |      conjugate or Hermitian transpose. For a complex matrix, the
     |      Hermitian adjoint is equal to the conjugate transpose.
     |      
     |      Can be abbreviated self.H instead of self.adjoint().
     |      
     |      Returns
     |      -------
     |      A_H : LinearOperator
     |          Hermitian adjoint of self.
     |  
     |  T
     |      Transpose this linear operator.
     |      
     |      Returns a LinearOperator that represents the transpose of this one.
     |      Can be abbreviated self.T instead of self.transpose().
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
    
    class OptimizeResult(__builtin__.dict)
     |  Represents the optimization result.
     |  
     |  Attributes
     |  ----------
     |  x : ndarray
     |      The solution of the optimization.
     |  success : bool
     |      Whether or not the optimizer exited successfully.
     |  status : int
     |      Termination status of the optimizer. Its value depends on the
     |      underlying solver. Refer to `message` for details.
     |  message : str
     |      Description of the cause of the termination.
     |  fun, jac, hess: ndarray
     |      Values of objective function, its Jacobian and its Hessian (if
     |      available). The Hessians may be approximations, see the documentation
     |      of the function in question.
     |  hess_inv : object
     |      Inverse of the objective function's Hessian; may be an approximation.
     |      Not available for all solvers. The type of this attribute may be
     |      either np.ndarray or scipy.sparse.linalg.LinearOperator.
     |  nfev, njev, nhev : int
     |      Number of evaluations of the objective functions and of its
     |      Jacobian and Hessian.
     |  nit : int
     |      Number of iterations performed by the optimizer.
     |  maxcv : float
     |      The maximum constraint violation.
     |  
     |  Notes
     |  -----
     |  There may be additional attributes not listed above depending of the
     |  specific solver. Since this class is essentially a subclass of dict
     |  with attribute accessors, one can see which attributes are available
     |  using the `keys()` method.
     |  
     |  Method resolution order:
     |      OptimizeResult
     |      __builtin__.dict
     |      __builtin__.object
     |  
     |  Methods defined here:
     |  
     |  __delattr__ = __delitem__(...)
     |      x.__delitem__(y) <==> del x[y]
     |  
     |  __dir__(self)
     |  
     |  __getattr__(self, name)
     |  
     |  __repr__(self)
     |  
     |  __setattr__ = __setitem__(...)
     |      x.__setitem__(i, y) <==> x[i]=y
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from __builtin__.dict:
     |  
     |  __cmp__(...)
     |      x.__cmp__(y) <==> cmp(x,y)
     |  
     |  __contains__(...)
     |      D.__contains__(k) -> True if D has a key k, else False
     |  
     |  __delitem__(...)
     |      x.__delitem__(y) <==> del x[y]
     |  
     |  __eq__(...)
     |      x.__eq__(y) <==> x==y
     |  
     |  __ge__(...)
     |      x.__ge__(y) <==> x>=y
     |  
     |  __getattribute__(...)
     |      x.__getattribute__('name') <==> x.name
     |  
     |  __getitem__(...)
     |      x.__getitem__(y) <==> x[y]
     |  
     |  __gt__(...)
     |      x.__gt__(y) <==> x>y
     |  
     |  __init__(...)
     |      x.__init__(...) initializes x; see help(type(x)) for signature
     |  
     |  __iter__(...)
     |      x.__iter__() <==> iter(x)
     |  
     |  __le__(...)
     |      x.__le__(y) <==> x<=y
     |  
     |  __len__(...)
     |      x.__len__() <==> len(x)
     |  
     |  __lt__(...)
     |      x.__lt__(y) <==> x<y
     |  
     |  __ne__(...)
     |      x.__ne__(y) <==> x!=y
     |  
     |  __setitem__(...)
     |      x.__setitem__(i, y) <==> x[i]=y
     |  
     |  __sizeof__(...)
     |      D.__sizeof__() -> size of D in memory, in bytes
     |  
     |  clear(...)
     |      D.clear() -> None.  Remove all items from D.
     |  
     |  copy(...)
     |      D.copy() -> a shallow copy of D
     |  
     |  fromkeys(...)
     |      dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.
     |      v defaults to None.
     |  
     |  get(...)
     |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
     |  
     |  has_key(...)
     |      D.has_key(k) -> True if D has a key k, else False
     |  
     |  items(...)
     |      D.items() -> list of D's (key, value) pairs, as 2-tuples
     |  
     |  iteritems(...)
     |      D.iteritems() -> an iterator over the (key, value) items of D
     |  
     |  iterkeys(...)
     |      D.iterkeys() -> an iterator over the keys of D
     |  
     |  itervalues(...)
     |      D.itervalues() -> an iterator over the values of D
     |  
     |  keys(...)
     |      D.keys() -> list of D's keys
     |  
     |  pop(...)
     |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
     |      If key is not found, d is returned if given, otherwise KeyError is raised
     |  
     |  popitem(...)
     |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |  
     |  setdefault(...)
     |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
     |  
     |  update(...)
     |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
     |      If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
     |      If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
     |      In either case, this is followed by: for k in F: D[k] = F[k]
     |  
     |  values(...)
     |      D.values() -> list of D's values
     |  
     |  viewitems(...)
     |      D.viewitems() -> a set-like object providing a view on D's items
     |  
     |  viewkeys(...)
     |      D.viewkeys() -> a set-like object providing a view on D's keys
     |  
     |  viewvalues(...)
     |      D.viewvalues() -> an object providing a view on D's values
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from __builtin__.dict:
     |  
     |  __hash__ = None
     |  
     |  __new__ = <built-in method __new__ of type object>
     |      T.__new__(S, ...) -> a new object with type S, a subtype of T
    
    class OptimizeWarning(exceptions.UserWarning)
     |  Method resolution order:
     |      OptimizeWarning
     |      exceptions.UserWarning
     |      exceptions.Warning
     |      exceptions.Exception
     |      exceptions.BaseException
     |      __builtin__.object
     |  
     |  Data descriptors defined here:
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from exceptions.UserWarning:
     |  
     |  __init__(...)
     |      x.__init__(...) initializes x; see help(type(x)) for signature
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from exceptions.UserWarning:
     |  
     |  __new__ = <built-in method __new__ of type object>
     |      T.__new__(S, ...) -> a new object with type S, a subtype of T
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from exceptions.BaseException:
     |  
     |  __delattr__(...)
     |      x.__delattr__('name') <==> del x.name
     |  
     |  __getattribute__(...)
     |      x.__getattribute__('name') <==> x.name
     |  
     |  __getitem__(...)
     |      x.__getitem__(y) <==> x[y]
     |  
     |  __getslice__(...)
     |      x.__getslice__(i, j) <==> x[i:j]
     |      
     |      Use of negative indices is not supported.
     |  
     |  __reduce__(...)
     |  
     |  __repr__(...)
     |      x.__repr__() <==> repr(x)
     |  
     |  __setattr__(...)
     |      x.__setattr__('name', value) <==> x.name = value
     |  
     |  __setstate__(...)
     |  
     |  __str__(...)
     |      x.__str__() <==> str(x)
     |  
     |  __unicode__(...)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors inherited from exceptions.BaseException:
     |  
     |  __dict__
     |  
     |  args
     |  
     |  message

FUNCTIONS
    anderson(F, xin, iter=None, alpha=None, w0=0.01, M=5, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using (extended) Anderson mixing.
        
        The Jacobian is formed by for a 'best' solution in the space
        spanned by last `M` vectors. As a result, only a MxM matrix
        inversions and MxN multiplications are required. [Ey]_
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            Initial guess for the Jacobian is (-1/alpha).
        M : float, optional
            Number of previous vectors to retain. Defaults to 5.
        w0 : float, optional
            Regularization parameter for numerical stability.
            Compared to unity, good values of the order of 0.01.
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
        
        References
        ----------
        .. [Ey] V. Eyert, J. Comp. Phys., 124, 271 (1996).
    
    approx_fprime(xk, f, epsilon, *args)
        Finite-difference approximation of the gradient of a scalar function.
        
        Parameters
        ----------
        xk : array_like
            The coordinate vector at which to determine the gradient of `f`.
        f : callable
            The function of which to determine the gradient (partial derivatives).
            Should take `xk` as first argument, other arguments to `f` can be
            supplied in ``*args``.  Should return a scalar, the value of the
            function at `xk`.
        epsilon : array_like
            Increment to `xk` to use for determining the function gradient.
            If a scalar, uses the same finite difference delta for all partial
            derivatives.  If an array, should contain one value per element of
            `xk`.
        \*args : args, optional
            Any other arguments that are to be passed to `f`.
        
        Returns
        -------
        grad : ndarray
            The partial derivatives of `f` to `xk`.
        
        See Also
        --------
        check_grad : Check correctness of gradient function against approx_fprime.
        
        Notes
        -----
        The function gradient is determined by the forward finite difference
        formula::
        
                     f(xk[i] + epsilon[i]) - f(xk[i])
            f'[i] = ---------------------------------
                                epsilon[i]
        
        The main use of `approx_fprime` is in scalar function optimizers like
        `fmin_bfgs`, to determine numerically the Jacobian of a function.
        
        Examples
        --------
        >>> from scipy import optimize
        >>> def func(x, c0, c1):
        ...     "Coordinate vector `x` should be an array of size two."
        ...     return c0 * x[0]**2 + c1*x[1]**2
        
        >>> x = np.ones(2)
        >>> c0, c1 = (1, 200)
        >>> eps = np.sqrt(np.finfo(float).eps)
        >>> optimize.approx_fprime(x, func, [eps, np.sqrt(200) * eps], c0, c1)
        array([   2.        ,  400.00004198])
    
    basinhopping(func, x0, niter=100, T=1.0, stepsize=0.5, minimizer_kwargs=None, take_step=None, accept_test=None, callback=None, interval=50, disp=False, niter_success=None)
        Find the global minimum of a function using the basin-hopping algorithm
        
        Parameters
        ----------
        func : callable ``f(x, *args)``
            Function to be optimized.  ``args`` can be passed as an optional item
            in the dict ``minimizer_kwargs``
        x0 : ndarray
            Initial guess.
        niter : integer, optional
            The number of basin hopping iterations
        T : float, optional
            The "temperature" parameter for the accept or reject criterion.  Higher
            "temperatures" mean that larger jumps in function value will be
            accepted.  For best results ``T`` should be comparable to the
            separation
            (in function value) between local minima.
        stepsize : float, optional
            initial step size for use in the random displacement.
        minimizer_kwargs : dict, optional
            Extra keyword arguments to be passed to the minimizer
            ``scipy.optimize.minimize()`` Some important options could be:
        
                method : str
                    The minimization method (e.g. ``"L-BFGS-B"``)
                args : tuple
                    Extra arguments passed to the objective function (``func``) and
                    its derivatives (Jacobian, Hessian).
        
        take_step : callable ``take_step(x)``, optional
            Replace the default step taking routine with this routine.  The default
            step taking routine is a random displacement of the coordinates, but
            other step taking algorithms may be better for some systems.
            ``take_step`` can optionally have the attribute ``take_step.stepsize``.
            If this attribute exists, then ``basinhopping`` will adjust
            ``take_step.stepsize`` in order to try to optimize the global minimum
            search.
        accept_test : callable, ``accept_test(f_new=f_new, x_new=x_new, f_old=fold, x_old=x_old)``, optional
            Define a test which will be used to judge whether or not to accept the
            step.  This will be used in addition to the Metropolis test based on
            "temperature" ``T``.  The acceptable return values are True,
            False, or ``"force accept"``. If any of the tests return False
            then the step is rejected. If the latter, then this will override any
            other tests in order to accept the step. This can be used, for example,
            to forcefully escape from a local minimum that ``basinhopping`` is
            trapped in.
        callback : callable, ``callback(x, f, accept)``, optional
            A callback function which will be called for all minima found.  ``x``
            and ``f`` are the coordinates and function value of the trial minimum,
            and ``accept`` is whether or not that minimum was accepted.  This can be
            used, for example, to save the lowest N minima found.  Also,
            ``callback`` can be used to specify a user defined stop criterion by
            optionally returning True to stop the ``basinhopping`` routine.
        interval : integer, optional
            interval for how often to update the ``stepsize``
        disp : bool, optional
            Set to True to print status messages
        niter_success : integer, optional
            Stop the run if the global minimum candidate remains the same for this
            number of iterations.
        
        
        Returns
        -------
        res : OptimizeResult
            The optimization result represented as a ``OptimizeResult`` object.  Important
            attributes are: ``x`` the solution array, ``fun`` the value of the
            function at the solution, and ``message`` which describes the cause of
            the termination. The ``OptimzeResult`` object returned by the selected
            minimizer at the lowest minimum is also contained within this object
            and can be accessed through the ``lowest_optimization_result`` attribute.
            See `OptimizeResult` for a description of other attributes.
        
        See Also
        --------
        minimize :
            The local minimization function called once for each basinhopping step.
            ``minimizer_kwargs`` is passed to this routine.
        
        Notes
        -----
        Basin-hopping is a stochastic algorithm which attempts to find the global
        minimum of a smooth scalar function of one or more variables [1]_ [2]_ [3]_
        [4]_.  The algorithm in its current form was described by David Wales and
        Jonathan Doye [2]_ http://www-wales.ch.cam.ac.uk/.
        
        The algorithm is iterative with each cycle composed of the following
        features
        
        1) random perturbation of the coordinates
        
        2) local minimization
        
        3) accept or reject the new coordinates based on the minimized function
           value
        
        The acceptance test used here is the Metropolis criterion of standard Monte
        Carlo algorithms, although there are many other possibilities [3]_.
        
        This global minimization method has been shown to be extremely efficient
        for a wide variety of problems in physics and chemistry.  It is
        particularly useful when the function has many minima separated by large
        barriers. See the Cambridge Cluster Database
        http://www-wales.ch.cam.ac.uk/CCD.html for databases of molecular systems
        that have been optimized primarily using basin-hopping.  This database
        includes minimization problems exceeding 300 degrees of freedom.
        
        See the free software program GMIN (http://www-wales.ch.cam.ac.uk/GMIN) for
        a Fortran implementation of basin-hopping.  This implementation has many
        different variations of the procedure described above, including more
        advanced step taking algorithms and alternate acceptance criterion.
        
        For stochastic global optimization there is no way to determine if the true
        global minimum has actually been found. Instead, as a consistency check,
        the algorithm can be run from a number of different random starting points
        to ensure the lowest minimum found in each example has converged to the
        global minimum.  For this reason ``basinhopping`` will by default simply
        run for the number of iterations ``niter`` and return the lowest minimum
        found.  It is left to the user to ensure that this is in fact the global
        minimum.
        
        Choosing ``stepsize``:  This is a crucial parameter in ``basinhopping`` and
        depends on the problem being solved.  Ideally it should be comparable to
        the typical separation between local minima of the function being
        optimized.  ``basinhopping`` will, by default, adjust ``stepsize`` to find
        an optimal value, but this may take many iterations.  You will get quicker
        results if you set a sensible value for ``stepsize``.
        
        Choosing ``T``: The parameter ``T`` is the temperature used in the
        metropolis criterion.  Basinhopping steps are accepted with probability
        ``1`` if ``func(xnew) < func(xold)``, or otherwise with probability::
        
            exp( -(func(xnew) - func(xold)) / T )
        
        So, for best results, ``T`` should to be comparable to the typical
        difference in function values between local minima.
        
        .. versionadded:: 0.12.0
        
        References
        ----------
        .. [1] Wales, David J. 2003, Energy Landscapes, Cambridge University Press,
            Cambridge, UK.
        .. [2] Wales, D J, and Doye J P K, Global Optimization by Basin-Hopping and
            the Lowest Energy Structures of Lennard-Jones Clusters Containing up to
            110 Atoms.  Journal of Physical Chemistry A, 1997, 101, 5111.
        .. [3] Li, Z. and Scheraga, H. A., Monte Carlo-minimization approach to the
            multiple-minima problem in protein folding, Proc. Natl. Acad. Sci. USA,
            1987, 84, 6611.
        .. [4] Wales, D. J. and Scheraga, H. A., Global optimization of clusters,
            crystals, and biomolecules, Science, 1999, 285, 1368.
        
        Examples
        --------
        The following example is a one-dimensional minimization problem,  with many
        local minima superimposed on a parabola.
        
        >>> from scipy.optimize import basinhopping
        >>> func = lambda x: np.cos(14.5 * x - 0.3) + (x + 0.2) * x
        >>> x0=[1.]
        
        Basinhopping, internally, uses a local minimization algorithm.  We will use
        the parameter ``minimizer_kwargs`` to tell basinhopping which algorithm to
        use and how to set up that minimizer.  This parameter will be passed to
        ``scipy.optimize.minimize()``.
        
        >>> minimizer_kwargs = {"method": "BFGS"}
        >>> ret = basinhopping(func, x0, minimizer_kwargs=minimizer_kwargs,
        ...                    niter=200)
        >>> print("global minimum: x = %.4f, f(x0) = %.4f" % (ret.x, ret.fun))
        global minimum: x = -0.1951, f(x0) = -1.0009
        
        Next consider a two-dimensional minimization problem. Also, this time we
        will use gradient information to significantly speed up the search.
        
        >>> def func2d(x):
        ...     f = np.cos(14.5 * x[0] - 0.3) + (x[1] + 0.2) * x[1] + (x[0] +
        ...                                                            0.2) * x[0]
        ...     df = np.zeros(2)
        ...     df[0] = -14.5 * np.sin(14.5 * x[0] - 0.3) + 2. * x[0] + 0.2
        ...     df[1] = 2. * x[1] + 0.2
        ...     return f, df
        
        We'll also use a different local minimization algorithm.  Also we must tell
        the minimizer that our function returns both energy and gradient (jacobian)
        
        >>> minimizer_kwargs = {"method":"L-BFGS-B", "jac":True}
        >>> x0 = [1.0, 1.0]
        >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
        ...                    niter=200)
        >>> print("global minimum: x = [%.4f, %.4f], f(x0) = %.4f" % (ret.x[0],
        ...                                                           ret.x[1],
        ...                                                           ret.fun))
        global minimum: x = [-0.1951, -0.1000], f(x0) = -1.0109
        
        
        Here is an example using a custom step taking routine.  Imagine you want
        the first coordinate to take larger steps then the rest of the coordinates.
        This can be implemented like so:
        
        >>> class MyTakeStep(object):
        ...    def __init__(self, stepsize=0.5):
        ...        self.stepsize = stepsize
        ...    def __call__(self, x):
        ...        s = self.stepsize
        ...        x[0] += np.random.uniform(-2.*s, 2.*s)
        ...        x[1:] += np.random.uniform(-s, s, x[1:].shape)
        ...        return x
        
        Since ``MyTakeStep.stepsize`` exists basinhopping will adjust the magnitude
        of ``stepsize`` to optimize the search.  We'll use the same 2-D function as
        before
        
        >>> mytakestep = MyTakeStep()
        >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
        ...                    niter=200, take_step=mytakestep)
        >>> print("global minimum: x = [%.4f, %.4f], f(x0) = %.4f" % (ret.x[0],
        ...                                                           ret.x[1],
        ...                                                           ret.fun))
        global minimum: x = [-0.1951, -0.1000], f(x0) = -1.0109
        
        
        Now let's do an example using a custom callback function which prints the
        value of every minimum found
        
        >>> def print_fun(x, f, accepted):
        ...         print("at minimum %.4f accepted %d" % (f, int(accepted)))
        
        We'll run it for only 10 basinhopping steps this time.
        
        >>> np.random.seed(1)
        >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
        ...                    niter=10, callback=print_fun)
        at minimum 0.4159 accepted 1
        at minimum -0.9073 accepted 1
        at minimum -0.1021 accepted 1
        at minimum -0.1021 accepted 1
        at minimum 0.9102 accepted 1
        at minimum 0.9102 accepted 1
        at minimum 2.2945 accepted 0
        at minimum -0.1021 accepted 1
        at minimum -1.0109 accepted 1
        at minimum -1.0109 accepted 1
        
        
        The minimum at -1.0109 is actually the global minimum, found already on the
        8th iteration.
        
        Now let's implement bounds on the problem using a custom ``accept_test``:
        
        >>> class MyBounds(object):
        ...     def __init__(self, xmax=[1.1,1.1], xmin=[-1.1,-1.1] ):
        ...         self.xmax = np.array(xmax)
        ...         self.xmin = np.array(xmin)
        ...     def __call__(self, **kwargs):
        ...         x = kwargs["x_new"]
        ...         tmax = bool(np.all(x <= self.xmax))
        ...         tmin = bool(np.all(x >= self.xmin))
        ...         return tmax and tmin
        
        >>> mybounds = MyBounds()
        >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
        ...                    niter=10, accept_test=mybounds)
    
    bisect(f, a, b, args=(), xtol=2e-12, rtol=8.8817841970012523e-16, maxiter=100, full_output=False, disp=True)
        Find root of a function within an interval.
        
        Basic bisection routine to find a zero of the function `f` between the
        arguments `a` and `b`. `f(a)` and `f(b)` cannot have the same signs.
        Slow but sure.
        
        Parameters
        ----------
        f : function
            Python function returning a number.  `f` must be continuous, and
            f(a) and f(b) must have opposite signs.
        a : number
            One end of the bracketing interval [a,b].
        b : number
            The other end of the bracketing interval [a,b].
        xtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter must be nonnegative.
        rtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter cannot be smaller than its default value of
            ``4*np.finfo(float).eps``.
        maxiter : number, optional
            if convergence is not achieved in `maxiter` iterations, an error is
            raised.  Must be >= 0.
        args : tuple, optional
            containing extra arguments for the function `f`.
            `f` is called by ``apply(f, (x)+args)``.
        full_output : bool, optional
            If `full_output` is False, the root is returned.  If `full_output` is
            True, the return value is ``(x, r)``, where x is the root, and r is
            a `RootResults` object.
        disp : bool, optional
            If True, raise RuntimeError if the algorithm didn't converge.
        
        Returns
        -------
        x0 : float
            Zero of `f` between `a` and `b`.
        r : RootResults (present if ``full_output = True``)
            Object containing information about the convergence.  In particular,
            ``r.converged`` is True if the routine converged.
        
        See Also
        --------
        brentq, brenth, bisect, newton
        fixed_point : scalar fixed-point finder
        fsolve : n-dimensional root-finding
    
    bracket(func, xa=0.0, xb=1.0, args=(), grow_limit=110.0, maxiter=1000)
        Bracket the minimum of the function.
        
        Given a function and distinct initial points, search in the
        downhill direction (as defined by the initital points) and return
        new points xa, xb, xc that bracket the minimum of the function
        f(xa) > f(xb) < f(xc). It doesn't always mean that obtained
        solution will satisfy xa<=x<=xb
        
        Parameters
        ----------
        func : callable f(x,*args)
            Objective function to minimize.
        xa, xb : float, optional
            Bracketing interval. Defaults `xa` to 0.0, and `xb` to 1.0.
        args : tuple, optional
            Additional arguments (if present), passed to `func`.
        grow_limit : float, optional
            Maximum grow limit.  Defaults to 110.0
        maxiter : int, optional
            Maximum number of iterations to perform. Defaults to 1000.
        
        Returns
        -------
        xa, xb, xc : float
            Bracket.
        fa, fb, fc : float
            Objective function values in bracket.
        funcalls : int
            Number of function evaluations made.
    
    brent(func, args=(), brack=None, tol=1.48e-08, full_output=0, maxiter=500)
        Given a function of one-variable and a possible bracketing interval,
        return the minimum of the function isolated to a fractional precision of
        tol.
        
        Parameters
        ----------
        func : callable f(x,*args)
            Objective function.
        args : tuple, optional
            Additional arguments (if present).
        brack : tuple, optional
            Either a triple (xa,xb,xc) where xa<xb<xc and func(xb) <
            func(xa), func(xc) or a pair (xa,xb) which are used as a
            starting interval for a downhill bracket search (see
            `bracket`). Providing the pair (xa,xb) does not always mean
            the obtained solution will satisfy xa<=x<=xb.
        tol : float, optional
            Stop if between iteration change is less than `tol`.
        full_output : bool, optional
            If True, return all output args (xmin, fval, iter,
            funcalls).
        maxiter : int, optional
            Maximum number of iterations in solution.
        
        Returns
        -------
        xmin : ndarray
            Optimum point.
        fval : float
            Optimum value.
        iter : int
            Number of iterations.
        funcalls : int
            Number of objective function evaluations made.
        
        See also
        --------
        minimize_scalar: Interface to minimization algorithms for scalar
            univariate functions. See the 'Brent' `method` in particular.
        
        Notes
        -----
        Uses inverse parabolic interpolation when possible to speed up
        convergence of golden section method.
    
    brenth(f, a, b, args=(), xtol=2e-12, rtol=8.8817841970012523e-16, maxiter=100, full_output=False, disp=True)
        Find root of f in [a,b].
        
        A variation on the classic Brent routine to find a zero of the function f
        between the arguments a and b that uses hyperbolic extrapolation instead of
        inverse quadratic extrapolation. There was a paper back in the 1980's ...
        f(a) and f(b) cannot have the same signs. Generally on a par with the
        brent routine, but not as heavily tested.  It is a safe version of the
        secant method that uses hyperbolic extrapolation. The version here is by
        Chuck Harris.
        
        Parameters
        ----------
        f : function
            Python function returning a number.  f must be continuous, and f(a) and
            f(b) must have opposite signs.
        a : number
            One end of the bracketing interval [a,b].
        b : number
            The other end of the bracketing interval [a,b].
        xtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter must be nonnegative. As with `brentq`, for nice
            functions the method will often satisfy the above condition
            will ``xtol/2`` and ``rtol/2``.
        rtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter cannot be smaller than its default value of
            ``4*np.finfo(float).eps``. As with `brentq`, for nice functions
            the method will often satisfy the above condition will
            ``xtol/2`` and ``rtol/2``.
        maxiter : number, optional
            if convergence is not achieved in maxiter iterations, an error is
            raised.  Must be >= 0.
        args : tuple, optional
            containing extra arguments for the function `f`.
            `f` is called by ``apply(f, (x)+args)``.
        full_output : bool, optional
            If `full_output` is False, the root is returned.  If `full_output` is
            True, the return value is ``(x, r)``, where `x` is the root, and `r` is
            a RootResults object.
        disp : bool, optional
            If True, raise RuntimeError if the algorithm didn't converge.
        
        Returns
        -------
        x0 : float
            Zero of `f` between `a` and `b`.
        r : RootResults (present if ``full_output = True``)
            Object containing information about the convergence.  In particular,
            ``r.converged`` is True if the routine converged.
        
        See Also
        --------
        fmin, fmin_powell, fmin_cg,
               fmin_bfgs, fmin_ncg : multivariate local optimizers
        
        leastsq : nonlinear least squares minimizer
        
        fmin_l_bfgs_b, fmin_tnc, fmin_cobyla : constrained multivariate optimizers
        
        basinhopping, differential_evolution, brute : global optimizers
        
        fminbound, brent, golden, bracket : local scalar minimizers
        
        fsolve : n-dimensional root-finding
        
        brentq, brenth, ridder, bisect, newton : one-dimensional root-finding
        
        fixed_point : scalar fixed-point finder
    
    brentq(f, a, b, args=(), xtol=2e-12, rtol=8.8817841970012523e-16, maxiter=100, full_output=False, disp=True)
        Find a root of a function in a bracketing interval using Brent's method.
        
        Uses the classic Brent's method to find a zero of the function `f` on
        the sign changing interval [a , b].  Generally considered the best of the
        rootfinding routines here.  It is a safe version of the secant method that
        uses inverse quadratic extrapolation.  Brent's method combines root
        bracketing, interval bisection, and inverse quadratic interpolation.  It is
        sometimes known as the van Wijngaarden-Dekker-Brent method.  Brent (1973)
        claims convergence is guaranteed for functions computable within [a,b].
        
        [Brent1973]_ provides the classic description of the algorithm.  Another
        description can be found in a recent edition of Numerical Recipes, including
        [PressEtal1992]_.  Another description is at
        http://mathworld.wolfram.com/BrentsMethod.html.  It should be easy to
        understand the algorithm just by reading our code.  Our code diverges a bit
        from standard presentations: we choose a different formula for the
        extrapolation step.
        
        Parameters
        ----------
        f : function
            Python function returning a number.  The function :math:`f`
            must be continuous, and :math:`f(a)` and :math:`f(b)` must
            have opposite signs.
        a : number
            One end of the bracketing interval :math:`[a, b]`.
        b : number
            The other end of the bracketing interval :math:`[a, b]`.
        xtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter must be nonnegative. For nice functions, Brent's
            method will often satisfy the above condition will ``xtol/2``
            and ``rtol/2``. [Brent1973]_
        rtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter cannot be smaller than its default value of
            ``4*np.finfo(float).eps``. For nice functions, Brent's
            method will often satisfy the above condition will ``xtol/2``
            and ``rtol/2``. [Brent1973]_
        maxiter : number, optional
            if convergence is not achieved in maxiter iterations, an error is
            raised.  Must be >= 0.
        args : tuple, optional
            containing extra arguments for the function `f`.
            `f` is called by ``apply(f, (x)+args)``.
        full_output : bool, optional
            If `full_output` is False, the root is returned.  If `full_output` is
            True, the return value is ``(x, r)``, where `x` is the root, and `r` is
            a RootResults object.
        disp : bool, optional
            If True, raise RuntimeError if the algorithm didn't converge.
        
        Returns
        -------
        x0 : float
            Zero of `f` between `a` and `b`.
        r : RootResults (present if ``full_output = True``)
            Object containing information about the convergence.  In particular,
            ``r.converged`` is True if the routine converged.
        
        See Also
        --------
        multivariate local optimizers
          `fmin`, `fmin_powell`, `fmin_cg`, `fmin_bfgs`, `fmin_ncg`
        nonlinear least squares minimizer
          `leastsq`
        constrained multivariate optimizers
          `fmin_l_bfgs_b`, `fmin_tnc`, `fmin_cobyla`
        global optimizers
          `basinhopping`, `brute`, `differential_evolution`
        local scalar minimizers
          `fminbound`, `brent`, `golden`, `bracket`
        n-dimensional root-finding
          `fsolve`
        one-dimensional root-finding
          `brentq`, `brenth`, `ridder`, `bisect`, `newton`
        scalar fixed-point finder
          `fixed_point`
        
        Notes
        -----
        `f` must be continuous.  f(a) and f(b) must have opposite signs.
        
        
        References
        ----------
        .. [Brent1973]
           Brent, R. P.,
           *Algorithms for Minimization Without Derivatives*.
           Englewood Cliffs, NJ: Prentice-Hall, 1973. Ch. 3-4.
        
        .. [PressEtal1992]
           Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T.
           *Numerical Recipes in FORTRAN: The Art of Scientific Computing*, 2nd ed.
           Cambridge, England: Cambridge University Press, pp. 352-355, 1992.
           Section 9.3:  "Van Wijngaarden-Dekker-Brent Method."
    
    broyden1(F, xin, iter=None, alpha=None, reduction_method='restart', max_rank=None, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using Broyden's first Jacobian approximation.
        
        This method is also known as \"Broyden's good method\".
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            Initial guess for the Jacobian is ``(-1/alpha)``.
        reduction_method : str or tuple, optional
            Method used in ensuring that the rank of the Broyden matrix
            stays low. Can either be a string giving the name of the method,
            or a tuple of the form ``(method, param1, param2, ...)``
            that gives the name of the method and values for additional parameters.
        
            Methods available:
        
                - ``restart``: drop all matrix columns. Has no extra parameters.
                - ``simple``: drop oldest matrix column. Has no extra parameters.
                - ``svd``: keep only the most significant SVD components.
                  Takes an extra parameter, ``to_retain``, which determines the
                  number of SVD components to retain when rank reduction is done.
                  Default is ``max_rank - 2``.
        
        max_rank : int, optional
            Maximum rank for the Broyden matrix.
            Default is infinity (ie., no rank reduction).
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
        
        Notes
        -----
        This algorithm implements the inverse Jacobian Quasi-Newton update
        
        .. math:: H_+ = H + (dx - H df) dx^\dagger H / ( dx^\dagger H df)
        
        which corresponds to Broyden's first Jacobian update
        
        .. math:: J_+ = J + (df - J dx) dx^\dagger / dx^\dagger dx
        
        
        References
        ----------
        .. [1] B.A. van der Rotten, PhD thesis,
           \"A limited memory Broyden method to solve high-dimensional
           systems of nonlinear equations\". Mathematisch Instituut,
           Universiteit Leiden, The Netherlands (2003).
        
           http://www.math.leidenuniv.nl/scripties/Rotten.pdf
    
    broyden2(F, xin, iter=None, alpha=None, reduction_method='restart', max_rank=None, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using Broyden's second Jacobian approximation.
        
        This method is also known as "Broyden's bad method".
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            Initial guess for the Jacobian is ``(-1/alpha)``.
        reduction_method : str or tuple, optional
            Method used in ensuring that the rank of the Broyden matrix
            stays low. Can either be a string giving the name of the method,
            or a tuple of the form ``(method, param1, param2, ...)``
            that gives the name of the method and values for additional parameters.
        
            Methods available:
        
                - ``restart``: drop all matrix columns. Has no extra parameters.
                - ``simple``: drop oldest matrix column. Has no extra parameters.
                - ``svd``: keep only the most significant SVD components.
                  Takes an extra parameter, ``to_retain``, which determines the
                  number of SVD components to retain when rank reduction is done.
                  Default is ``max_rank - 2``.
        
        max_rank : int, optional
            Maximum rank for the Broyden matrix.
            Default is infinity (ie., no rank reduction).
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
        
        Notes
        -----
        This algorithm implements the inverse Jacobian Quasi-Newton update
        
        .. math:: H_+ = H + (dx - H df) df^\dagger / ( df^\dagger df)
        
        corresponding to Broyden's second method.
        
        References
        ----------
        .. [1] B.A. van der Rotten, PhD thesis,
           "A limited memory Broyden method to solve high-dimensional
           systems of nonlinear equations". Mathematisch Instituut,
           Universiteit Leiden, The Netherlands (2003).
        
           http://www.math.leidenuniv.nl/scripties/Rotten.pdf
    
    brute(func, ranges, args=(), Ns=20, full_output=0, finish=<function fmin>, disp=False)
        Minimize a function over a given range by brute force.
        
        Uses the "brute force" method, i.e. computes the function's value
        at each point of a multidimensional grid of points, to find the global
        minimum of the function.
        
        The function is evaluated everywhere in the range with the datatype of the
        first call to the function, as enforced by the ``vectorize`` NumPy
        function.  The value and type of the function evaluation returned when
        ``full_output=True`` are affected in addition by the ``finish`` argument
        (see Notes).
        
        Parameters
        ----------
        func : callable
            The objective function to be minimized. Must be in the
            form ``f(x, *args)``, where ``x`` is the argument in
            the form of a 1-D array and ``args`` is a tuple of any
            additional fixed parameters needed to completely specify
            the function.
        ranges : tuple
            Each component of the `ranges` tuple must be either a
            "slice object" or a range tuple of the form ``(low, high)``.
            The program uses these to create the grid of points on which
            the objective function will be computed. See `Note 2` for
            more detail.
        args : tuple, optional
            Any additional fixed parameters needed to completely specify
            the function.
        Ns : int, optional
            Number of grid points along the axes, if not otherwise
            specified. See `Note2`.
        full_output : bool, optional
            If True, return the evaluation grid and the objective function's
            values on it.
        finish : callable, optional
            An optimization function that is called with the result of brute force
            minimization as initial guess.  `finish` should take `func` and
            the initial guess as positional arguments, and take `args` as
            keyword arguments.  It may additionally take `full_output`
            and/or `disp` as keyword arguments.  Use None if no "polishing"
            function is to be used. See Notes for more details.
        disp : bool, optional
            Set to True to print convergence messages.
        
        Returns
        -------
        x0 : ndarray
            A 1-D array containing the coordinates of a point at which the
            objective function had its minimum value. (See `Note 1` for
            which point is returned.)
        fval : float
            Function value at the point `x0`. (Returned when `full_output` is
            True.)
        grid : tuple
            Representation of the evaluation grid.  It has the same
            length as `x0`. (Returned when `full_output` is True.)
        Jout : ndarray
            Function values at each point of the evaluation
            grid, `i.e.`, ``Jout = func(*grid)``. (Returned
            when `full_output` is True.)
        
        See Also
        --------
        basinhopping, differential_evolution
        
        Notes
        -----
        *Note 1*: The program finds the gridpoint at which the lowest value
        of the objective function occurs.  If `finish` is None, that is the
        point returned.  When the global minimum occurs within (or not very far
        outside) the grid's boundaries, and the grid is fine enough, that
        point will be in the neighborhood of the global minimum.
        
        However, users often employ some other optimization program to
        "polish" the gridpoint values, `i.e.`, to seek a more precise
        (local) minimum near `brute's` best gridpoint.
        The `brute` function's `finish` option provides a convenient way to do
        that.  Any polishing program used must take `brute's` output as its
        initial guess as a positional argument, and take `brute's` input values
        for `args` as keyword arguments, otherwise an error will be raised.
        It may additionally take `full_output` and/or `disp` as keyword arguments.
        
        `brute` assumes that the `finish` function returns either an
        `OptimizeResult` object or a tuple in the form:
        ``(xmin, Jmin, ... , statuscode)``, where ``xmin`` is the minimizing
        value of the argument, ``Jmin`` is the minimum value of the objective
        function, "..." may be some other returned values (which are not used
        by `brute`), and ``statuscode`` is the status code of the `finish` program.
        
        Note that when `finish` is not None, the values returned are those
        of the `finish` program, *not* the gridpoint ones.  Consequently,
        while `brute` confines its search to the input grid points,
        the `finish` program's results usually will not coincide with any
        gridpoint, and may fall outside the grid's boundary. Thus, if a
        minimum only needs to be found over the provided grid points, make
        sure to pass in `finish=None`.
        
        *Note 2*: The grid of points is a `numpy.mgrid` object.
        For `brute` the `ranges` and `Ns` inputs have the following effect.
        Each component of the `ranges` tuple can be either a slice object or a
        two-tuple giving a range of values, such as (0, 5).  If the component is a
        slice object, `brute` uses it directly.  If the component is a two-tuple
        range, `brute` internally converts it to a slice object that interpolates
        `Ns` points from its low-value to its high-value, inclusive.
        
        Examples
        --------
        We illustrate the use of `brute` to seek the global minimum of a function
        of two variables that is given as the sum of a positive-definite
        quadratic and two deep "Gaussian-shaped" craters.  Specifically, define
        the objective function `f` as the sum of three other functions,
        ``f = f1 + f2 + f3``.  We suppose each of these has a signature
        ``(z, *params)``, where ``z = (x, y)``,  and ``params`` and the functions
        are as defined below.
        
        >>> params = (2, 3, 7, 8, 9, 10, 44, -1, 2, 26, 1, -2, 0.5)
        >>> def f1(z, *params):
        ...     x, y = z
        ...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
        ...     return (a * x**2 + b * x * y + c * y**2 + d*x + e*y + f)
        
        >>> def f2(z, *params):
        ...     x, y = z
        ...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
        ...     return (-g*np.exp(-((x-h)**2 + (y-i)**2) / scale))
        
        >>> def f3(z, *params):
        ...     x, y = z
        ...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
        ...     return (-j*np.exp(-((x-k)**2 + (y-l)**2) / scale))
        
        >>> def f(z, *params):
        ...     return f1(z, *params) + f2(z, *params) + f3(z, *params)
        
        Thus, the objective function may have local minima near the minimum
        of each of the three functions of which it is composed.  To
        use `fmin` to polish its gridpoint result, we may then continue as
        follows:
        
        >>> rranges = (slice(-4, 4, 0.25), slice(-4, 4, 0.25))
        >>> from scipy import optimize
        >>> resbrute = optimize.brute(f, rranges, args=params, full_output=True,
        ...                           finish=optimize.fmin)
        >>> resbrute[0]  # global minimum
        array([-1.05665192,  1.80834843])
        >>> resbrute[1]  # function value at global minimum
        -3.4085818767
        
        Note that if `finish` had been set to None, we would have gotten the
        gridpoint [-1.0 1.75] where the rounded function value is -2.892.
    
    check_grad(func, grad, x0, *args, **kwargs)
        Check the correctness of a gradient function by comparing it against a
        (forward) finite-difference approximation of the gradient.
        
        Parameters
        ----------
        func : callable ``func(x0, *args)``
            Function whose derivative is to be checked.
        grad : callable ``grad(x0, *args)``
            Gradient of `func`.
        x0 : ndarray
            Points to check `grad` against forward difference approximation of grad
            using `func`.
        args : \*args, optional
            Extra arguments passed to `func` and `grad`.
        epsilon : float, optional
            Step size used for the finite difference approximation. It defaults to
            ``sqrt(numpy.finfo(float).eps)``, which is approximately 1.49e-08.
        
        Returns
        -------
        err : float
            The square root of the sum of squares (i.e. the 2-norm) of the
            difference between ``grad(x0, *args)`` and the finite difference
            approximation of `grad` using func at the points `x0`.
        
        See Also
        --------
        approx_fprime
        
        Examples
        --------
        >>> def func(x):
        ...     return x[0]**2 - 0.5 * x[1]**3
        >>> def grad(x):
        ...     return [2 * x[0], -1.5 * x[1]**2]
        >>> from scipy.optimize import check_grad
        >>> check_grad(func, grad, [1.5, -1.5])
        2.9802322387695312e-08
    
    curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, jac=None, **kwargs)
        Use non-linear least squares to fit a function, f, to data.
        
        Assumes ``ydata = f(xdata, *params) + eps``
        
        Parameters
        ----------
        f : callable
            The model function, f(x, ...).  It must take the independent
            variable as the first argument and the parameters to fit as
            separate remaining arguments.
        xdata : An M-length sequence or an (k,M)-shaped array
            for functions with k predictors.
            The independent variable where the data is measured.
        ydata : M-length sequence
            The dependent data --- nominally f(xdata, ...)
        p0 : None, scalar, or N-length sequence, optional
            Initial guess for the parameters.  If None, then the initial
            values will all be 1 (if the number of parameters for the function
            can be determined using introspection, otherwise a ValueError
            is raised).
        sigma : None or M-length sequence, optional
            If not None, the uncertainties in the ydata array. These are used as
            weights in the least-squares problem
            i.e. minimising ``np.sum( ((f(xdata, *popt) - ydata) / sigma)**2 )``
            If None, the uncertainties are assumed to be 1.
        absolute_sigma : bool, optional
            If False, `sigma` denotes relative weights of the data points.
            The returned covariance matrix `pcov` is based on *estimated*
            errors in the data, and is not affected by the overall
            magnitude of the values in `sigma`. Only the relative
            magnitudes of the `sigma` values matter.
        
            If True, `sigma` describes one standard deviation errors of
            the input data points. The estimated covariance in `pcov` is
            based on these values.
        check_finite : bool, optional
            If True, check that the input arrays do not contain nans of infs,
            and raise a ValueError if they do. Setting this parameter to
            False may silently produce nonsensical results if the input arrays
            do contain nans. Default is True.
        bounds : 2-tuple of array_like, optional
            Lower and upper bounds on independent variables. Defaults to no bounds.        
            Each element of the tuple must be either an array with the length equal
            to the number of parameters, or a scalar (in which case the bound is
            taken to be the same for all parameters.) Use ``np.inf`` with an
            appropriate sign to disable bounds on all or some parameters.
        
            .. versionadded:: 0.17
        method : {'lm', 'trf', 'dogbox'}, optional
            Method to use for optimization.  See `least_squares` for more details.
            Default is 'lm' for unconstrained problems and 'trf' if `bounds` are
            provided. The method 'lm' won't work when the number of observations
            is less than the number of variables, use 'trf' or 'dogbox' in this
            case.
        
            .. versionadded:: 0.17
        jac : callable, string or None, optional
            Function with signature ``jac(x, ...)`` which computes the Jacobian
            matrix of the model function with respect to parameters as a dense
            array_like structure. It will be scaled according to provided `sigma`.
            If None (default), the Jacobian will be estimated numerically.
            String keywords for 'trf' and 'dogbox' methods can be used to select
            a finite difference scheme, see `least_squares`.
        
            .. versionadded:: 0.18
        kwargs
            Keyword arguments passed to `leastsq` for ``method='lm'`` or
            `least_squares` otherwise.
        
        Returns
        -------
        popt : array
            Optimal values for the parameters so that the sum of the squared error
            of ``f(xdata, *popt) - ydata`` is minimized
        pcov : 2d array
            The estimated covariance of popt. The diagonals provide the variance
            of the parameter estimate. To compute one standard deviation errors
            on the parameters use ``perr = np.sqrt(np.diag(pcov))``.
        
            How the `sigma` parameter affects the estimated covariance
            depends on `absolute_sigma` argument, as described above.
        
            If the Jacobian matrix at the solution doesn't have a full rank, then
            'lm' method returns a matrix filled with ``np.inf``, on the other hand
            'trf'  and 'dogbox' methods use Moore-Penrose pseudoinverse to compute
            the covariance matrix.
        
        Raises
        ------
        ValueError
            if either `ydata` or `xdata` contain NaNs, or if incompatible options
            are used.
        
        RuntimeError
            if the least-squares minimization fails.
        
        OptimizeWarning
            if covariance of the parameters can not be estimated.
        
        See Also
        --------
        least_squares : Minimize the sum of squares of nonlinear functions.
        stats.linregress : Calculate a linear least squares regression for two sets
                           of measurements.
        
        Notes
        -----
        With ``method='lm'``, the algorithm uses the Levenberg-Marquardt algorithm
        through `leastsq`. Note that this algorithm can only deal with
        unconstrained problems.
        
        Box constraints can be handled by methods 'trf' and 'dogbox'. Refer to
        the docstring of `least_squares` for more information.
        
        Examples
        --------
        >>> import numpy as np
        >>> from scipy.optimize import curve_fit
        >>> def func(x, a, b, c):
        ...     return a * np.exp(-b * x) + c
        
        >>> xdata = np.linspace(0, 4, 50)
        >>> y = func(xdata, 2.5, 1.3, 0.5)
        >>> ydata = y + 0.2 * np.random.normal(size=len(xdata))
        
        >>> popt, pcov = curve_fit(func, xdata, ydata)
        
        Constrain the optimization to the region of ``0 < a < 3``, ``0 < b < 2``
        and ``0 < c < 1``:
        
        >>> popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 2., 1.]))
    
    diagbroyden(F, xin, iter=None, alpha=None, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using diagonal Broyden Jacobian approximation.
        
        The Jacobian approximation is derived from previous iterations, by
        retaining only the diagonal of Broyden matrices.
        
        .. warning::
        
           This algorithm may be useful for specific problems, but whether
           it will work may depend strongly on the problem.
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            Initial guess for the Jacobian is (-1/alpha).
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
    
    differential_evolution(func, bounds, args=(), strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube')
        Finds the global minimum of a multivariate function.
        Differential Evolution is stochastic in nature (does not use gradient
        methods) to find the minimium, and can search large areas of candidate
        space, but often requires larger numbers of function evaluations than
        conventional gradient based techniques.
        
        The algorithm is due to Storn and Price [1]_.
        
        Parameters
        ----------
        func : callable
            The objective function to be minimized.  Must be in the form
            ``f(x, *args)``, where ``x`` is the argument in the form of a 1-D array
            and ``args`` is a  tuple of any additional fixed parameters needed to
            completely specify the function.
        bounds : sequence
            Bounds for variables.  ``(min, max)`` pairs for each element in ``x``,
            defining the lower and upper bounds for the optimizing argument of
            `func`. It is required to have ``len(bounds) == len(x)``.
            ``len(bounds)`` is used to determine the number of parameters in ``x``.
        args : tuple, optional
            Any additional fixed parameters needed to
            completely specify the objective function.
        strategy : str, optional
            The differential evolution strategy to use. Should be one of:
        
                - 'best1bin'
                - 'best1exp'
                - 'rand1exp'
                - 'randtobest1exp'
                - 'best2exp'
                - 'rand2exp'
                - 'randtobest1bin'
                - 'best2bin'
                - 'rand2bin'
                - 'rand1bin'
        
            The default is 'best1bin'.
        maxiter : int, optional
            The maximum number of generations over which the entire population is
            evolved. The maximum number of function evaluations (with no polishing)
            is: ``(maxiter + 1) * popsize * len(x)``
        popsize : int, optional
            A multiplier for setting the total population size.  The population has
            ``popsize * len(x)`` individuals.
        tol : float, optional
            When the mean of the population energies, multiplied by tol,
            divided by the standard deviation of the population energies
            is greater than 1 the solving process terminates:
            ``convergence = mean(pop) * tol / stdev(pop) > 1``
        mutation : float or tuple(float, float), optional
            The mutation constant. In the literature this is also known as
            differential weight, being denoted by F.
            If specified as a float it should be in the range [0, 2].
            If specified as a tuple ``(min, max)`` dithering is employed. Dithering
            randomly changes the mutation constant on a generation by generation
            basis. The mutation constant for that generation is taken from
            ``U[min, max)``. Dithering can help speed convergence significantly.
            Increasing the mutation constant increases the search radius, but will
            slow down convergence.
        recombination : float, optional
            The recombination constant, should be in the range [0, 1]. In the
            literature this is also known as the crossover probability, being
            denoted by CR. Increasing this value allows a larger number of mutants
            to progress into the next generation, but at the risk of population
            stability.
        seed : int or `np.random.RandomState`, optional
            If `seed` is not specified the `np.RandomState` singleton is used.
            If `seed` is an int, a new `np.random.RandomState` instance is used,
            seeded with seed.
            If `seed` is already a `np.random.RandomState instance`, then that
            `np.random.RandomState` instance is used.
            Specify `seed` for repeatable minimizations.
        disp : bool, optional
            Display status messages
        callback : callable, `callback(xk, convergence=val)`, optional
            A function to follow the progress of the minimization. ``xk`` is
            the current value of ``x0``. ``val`` represents the fractional
            value of the population convergence.  When ``val`` is greater than one
            the function halts. If callback returns `True`, then the minimization
            is halted (any polishing is still carried out).
        polish : bool, optional
            If True (default), then `scipy.optimize.minimize` with the `L-BFGS-B`
            method is used to polish the best population member at the end, which
            can improve the minimization slightly.
        init : string, optional
            Specify how the population initialization is performed. Should be
            one of:
        
                - 'latinhypercube'
                - 'random'
        
            The default is 'latinhypercube'. Latin Hypercube sampling tries to
            maximize coverage of the available parameter space. 'random' initializes
            the population randomly - this has the drawback that clustering can
            occur, preventing the whole of parameter space being covered.
        
        Returns
        -------
        res : OptimizeResult
            The optimization result represented as a `OptimizeResult` object.
            Important attributes are: ``x`` the solution array, ``success`` a
            Boolean flag indicating if the optimizer exited successfully and
            ``message`` which describes the cause of the termination. See
            `OptimizeResult` for a description of other attributes.  If `polish`
            was employed, and a lower minimum was obtained by the polishing, then
            OptimizeResult also contains the ``jac`` attribute.
        
        Notes
        -----
        Differential evolution is a stochastic population based method that is
        useful for global optimization problems. At each pass through the population
        the algorithm mutates each candidate solution by mixing with other candidate
        solutions to create a trial candidate. There are several strategies [2]_ for
        creating trial candidates, which suit some problems more than others. The
        'best1bin' strategy is a good starting point for many systems. In this
        strategy two members of the population are randomly chosen. Their difference
        is used to mutate the best member (the `best` in `best1bin`), :math:`b_0`,
        so far:
        
        .. math::
        
            b' = b_0 + mutation * (population[rand0] - population[rand1])
        
        A trial vector is then constructed. Starting with a randomly chosen 'i'th
        parameter the trial is sequentially filled (in modulo) with parameters from
        `b'` or the original candidate. The choice of whether to use `b'` or the
        original candidate is made with a binomial distribution (the 'bin' in
        'best1bin') - a random number in [0, 1) is generated.  If this number is
        less than the `recombination` constant then the parameter is loaded from
        `b'`, otherwise it is loaded from the original candidate.  The final
        parameter is always loaded from `b'`.  Once the trial candidate is built
        its fitness is assessed. If the trial is better than the original candidate
        then it takes its place. If it is also better than the best overall
        candidate it also replaces that.
        To improve your chances of finding a global minimum use higher `popsize`
        values, with higher `mutation` and (dithering), but lower `recombination`
        values. This has the effect of widening the search radius, but slowing
        convergence.
        
        .. versionadded:: 0.15.0
        
        Examples
        --------
        Let us consider the problem of minimizing the Rosenbrock function. This
        function is implemented in `rosen` in `scipy.optimize`.
        
        >>> from scipy.optimize import rosen, differential_evolution
        >>> bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)]
        >>> result = differential_evolution(rosen, bounds)
        >>> result.x, result.fun
        (array([1., 1., 1., 1., 1.]), 1.9216496320061384e-19)
        
        Next find the minimum of the Ackley function
        (http://en.wikipedia.org/wiki/Test_functions_for_optimization).
        
        >>> from scipy.optimize import differential_evolution
        >>> import numpy as np
        >>> def ackley(x):
        ...     arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2))
        ...     arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1]))
        ...     return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e
        >>> bounds = [(-5, 5), (-5, 5)]
        >>> result = differential_evolution(ackley, bounds)
        >>> result.x, result.fun
        (array([ 0.,  0.]), 4.4408920985006262e-16)
        
        References
        ----------
        .. [1] Storn, R and Price, K, Differential Evolution - a Simple and
               Efficient Heuristic for Global Optimization over Continuous Spaces,
               Journal of Global Optimization, 1997, 11, 341 - 359.
        .. [2] http://www1.icsi.berkeley.edu/~storn/code.html
        .. [3] http://en.wikipedia.org/wiki/Differential_evolution
    
    excitingmixing(F, xin, iter=None, alpha=None, alphamax=1.0, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using a tuned diagonal Jacobian approximation.
        
        The Jacobian matrix is diagonal and is tuned on each iteration.
        
        .. warning::
        
           This algorithm may be useful for specific problems, but whether
           it will work may depend strongly on the problem.
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            Initial Jacobian approximation is (-1/alpha).
        alphamax : float, optional
            The entries of the diagonal Jacobian are kept in the range
            ``[alpha, alphamax]``.
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
    
    fixed_point(func, x0, args=(), xtol=1e-08, maxiter=500, method='del2')
        Find a fixed point of the function.
        
        Given a function of one or more variables and a starting point, find a
        fixed-point of the function: i.e. where ``func(x0) == x0``.
        
        Parameters
        ----------
        func : function
            Function to evaluate.
        x0 : array_like
            Fixed point of function.
        args : tuple, optional
            Extra arguments to `func`.
        xtol : float, optional
            Convergence tolerance, defaults to 1e-08.
        maxiter : int, optional
            Maximum number of iterations, defaults to 500.
        method : {"del2", "iteration"}, optional
            Method of finding the fixed-point, defaults to "del2"
            which uses Steffensen's Method with Aitken's ``Del^2``
            convergence acceleration [1]_. The "iteration" method simply iterates
            the function until convergence is detected, without attempting to
            accelerate the convergence.
        
        References
        ----------
        .. [1] Burden, Faires, "Numerical Analysis", 5th edition, pg. 80
        
        Examples
        --------
        >>> from scipy import optimize
        >>> def func(x, c1, c2):
        ...    return np.sqrt(c1/(x+c2))
        >>> c1 = np.array([10,12.])
        >>> c2 = np.array([3, 5.])
        >>> optimize.fixed_point(func, [1.2, 1.3], args=(c1,c2))
        array([ 1.4920333 ,  1.37228132])
    
    fmin(func, x0, args=(), xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None, initial_simplex=None)
        Minimize a function using the downhill simplex algorithm.
        
        This algorithm only uses function values, not derivatives or second
        derivatives.
        
        Parameters
        ----------
        func : callable func(x,*args)
            The objective function to be minimized.
        x0 : ndarray
            Initial guess.
        args : tuple, optional
            Extra arguments passed to func, i.e. ``f(x,*args)``.
        xtol : float, optional
            Absolute error in xopt between iterations that is acceptable for
            convergence.
        ftol : number, optional
            Absolute error in func(xopt) between iterations that is acceptable for
            convergence.
        maxiter : int, optional
            Maximum number of iterations to perform.
        maxfun : number, optional
            Maximum number of function evaluations to make.
        full_output : bool, optional
            Set to True if fopt and warnflag outputs are desired.
        disp : bool, optional
            Set to True to print convergence messages.
        retall : bool, optional
            Set to True to return list of solutions at each iteration.
        callback : callable, optional
            Called after each iteration, as callback(xk), where xk is the
            current parameter vector.
        initial_simplex : array_like of shape (N + 1, N), optional
            Initial simplex. If given, overrides `x0`.
            ``initial_simplex[j,:]`` should contain the coordinates of
            the j-th vertex of the ``N+1`` vertices in the simplex, where
            ``N`` is the dimension.
        
        Returns
        -------
        xopt : ndarray
            Parameter that minimizes function.
        fopt : float
            Value of function at minimum: ``fopt = func(xopt)``.
        iter : int
            Number of iterations performed.
        funcalls : int
            Number of function calls made.
        warnflag : int
            1 : Maximum number of function evaluations made.
            2 : Maximum number of iterations reached.
        allvecs : list
            Solution at each iteration.
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'Nelder-Mead' `method` in particular.
        
        Notes
        -----
        Uses a Nelder-Mead simplex algorithm to find the minimum of function of
        one or more variables.
        
        This algorithm has a long history of successful use in applications.
        But it will usually be slower than an algorithm that uses first or
        second derivative information. In practice it can have poor
        performance in high-dimensional problems and is not robust to
        minimizing complicated functions. Additionally, there currently is no
        complete theory describing when the algorithm will successfully
        converge to the minimum, or how fast it will if it does. Both the ftol and
        xtol criteria must be met for convergence.
        
        References
        ----------
        .. [1] Nelder, J.A. and Mead, R. (1965), "A simplex method for function
               minimization", The Computer Journal, 7, pp. 308-313
        
        .. [2] Wright, M.H. (1996), "Direct Search Methods: Once Scorned, Now
               Respectable", in Numerical Analysis 1995, Proceedings of the
               1995 Dundee Biennial Conference in Numerical Analysis, D.F.
               Griffiths and G.A. Watson (Eds.), Addison Wesley Longman,
               Harlow, UK, pp. 191-208.
    
    fmin_bfgs(f, x0, fprime=None, args=(), gtol=1e-05, norm=inf, epsilon=1.4901161193847656e-08, maxiter=None, full_output=0, disp=1, retall=0, callback=None)
        Minimize a function using the BFGS algorithm.
        
        Parameters
        ----------
        f : callable f(x,*args)
            Objective function to be minimized.
        x0 : ndarray
            Initial guess.
        fprime : callable f'(x,*args), optional
            Gradient of f.
        args : tuple, optional
            Extra arguments passed to f and fprime.
        gtol : float, optional
            Gradient norm must be less than gtol before successful termination.
        norm : float, optional
            Order of norm (Inf is max, -Inf is min)
        epsilon : int or ndarray, optional
            If fprime is approximated, use this value for the step size.
        callback : callable, optional
            An optional user-supplied function to call after each
            iteration.  Called as callback(xk), where xk is the
            current parameter vector.
        maxiter : int, optional
            Maximum number of iterations to perform.
        full_output : bool, optional
            If True,return fopt, func_calls, grad_calls, and warnflag
            in addition to xopt.
        disp : bool, optional
            Print convergence message if True.
        retall : bool, optional
            Return a list of results at each iteration if True.
        
        Returns
        -------
        xopt : ndarray
            Parameters which minimize f, i.e. f(xopt) == fopt.
        fopt : float
            Minimum value.
        gopt : ndarray
            Value of gradient at minimum, f'(xopt), which should be near 0.
        Bopt : ndarray
            Value of 1/f''(xopt), i.e. the inverse hessian matrix.
        func_calls : int
            Number of function_calls made.
        grad_calls : int
            Number of gradient calls made.
        warnflag : integer
            1 : Maximum number of iterations exceeded.
            2 : Gradient and/or function calls not changing.
        allvecs  :  list
            `OptimizeResult` at each iteration.  Only returned if retall is True.
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'BFGS' `method` in particular.
        
        Notes
        -----
        Optimize the function, f, whose gradient is given by fprime
        using the quasi-Newton method of Broyden, Fletcher, Goldfarb,
        and Shanno (BFGS)
        
        References
        ----------
        Wright, and Nocedal 'Numerical Optimization', 1999, pg. 198.
    
    fmin_cg(f, x0, fprime=None, args=(), gtol=1e-05, norm=inf, epsilon=1.4901161193847656e-08, maxiter=None, full_output=0, disp=1, retall=0, callback=None)
        Minimize a function using a nonlinear conjugate gradient algorithm.
        
        Parameters
        ----------
        f : callable, ``f(x, *args)``
            Objective function to be minimized.  Here `x` must be a 1-D array of
            the variables that are to be changed in the search for a minimum, and
            `args` are the other (fixed) parameters of `f`.
        x0 : ndarray
            A user-supplied initial estimate of `xopt`, the optimal value of `x`.
            It must be a 1-D array of values.
        fprime : callable, ``fprime(x, *args)``, optional
            A function that returns the gradient of `f` at `x`. Here `x` and `args`
            are as described above for `f`. The returned value must be a 1-D array.
            Defaults to None, in which case the gradient is approximated
            numerically (see `epsilon`, below).
        args : tuple, optional
            Parameter values passed to `f` and `fprime`. Must be supplied whenever
            additional fixed parameters are needed to completely specify the
            functions `f` and `fprime`.
        gtol : float, optional
            Stop when the norm of the gradient is less than `gtol`.
        norm : float, optional
            Order to use for the norm of the gradient
            (``-np.Inf`` is min, ``np.Inf`` is max).
        epsilon : float or ndarray, optional
            Step size(s) to use when `fprime` is approximated numerically. Can be a
            scalar or a 1-D array.  Defaults to ``sqrt(eps)``, with eps the
            floating point machine precision.  Usually ``sqrt(eps)`` is about
            1.5e-8.
        maxiter : int, optional
            Maximum number of iterations to perform. Default is ``200 * len(x0)``.
        full_output : bool, optional
            If True, return `fopt`, `func_calls`, `grad_calls`, and `warnflag` in
            addition to `xopt`.  See the Returns section below for additional
            information on optional return values.
        disp : bool, optional
            If True, return a convergence message, followed by `xopt`.
        retall : bool, optional
            If True, add to the returned values the results of each iteration.
        callback : callable, optional
            An optional user-supplied function, called after each iteration.
            Called as ``callback(xk)``, where ``xk`` is the current value of `x0`.
        
        Returns
        -------
        xopt : ndarray
            Parameters which minimize f, i.e. ``f(xopt) == fopt``.
        fopt : float, optional
            Minimum value found, f(xopt).  Only returned if `full_output` is True.
        func_calls : int, optional
            The number of function_calls made.  Only returned if `full_output`
            is True.
        grad_calls : int, optional
            The number of gradient calls made. Only returned if `full_output` is
            True.
        warnflag : int, optional
            Integer value with warning status, only returned if `full_output` is
            True.
        
            0 : Success.
        
            1 : The maximum number of iterations was exceeded.
        
            2 : Gradient and/or function calls were not changing.  May indicate
                that precision was lost, i.e., the routine did not converge.
        
        allvecs : list of ndarray, optional
            List of arrays, containing the results at each iteration.
            Only returned if `retall` is True.
        
        See Also
        --------
        minimize : common interface to all `scipy.optimize` algorithms for
                   unconstrained and constrained minimization of multivariate
                   functions.  It provides an alternative way to call
                   ``fmin_cg``, by specifying ``method='CG'``.
        
        Notes
        -----
        This conjugate gradient algorithm is based on that of Polak and Ribiere
        [1]_.
        
        Conjugate gradient methods tend to work better when:
        
        1. `f` has a unique global minimizing point, and no local minima or
           other stationary points,
        2. `f` is, at least locally, reasonably well approximated by a
           quadratic function of the variables,
        3. `f` is continuous and has a continuous gradient,
        4. `fprime` is not too large, e.g., has a norm less than 1000,
        5. The initial guess, `x0`, is reasonably close to `f` 's global
           minimizing point, `xopt`.
        
        References
        ----------
        .. [1] Wright & Nocedal, "Numerical Optimization", 1999, pp. 120-122.
        
        Examples
        --------
        Example 1: seek the minimum value of the expression
        ``a*u**2 + b*u*v + c*v**2 + d*u + e*v + f`` for given values
        of the parameters and an initial guess ``(u, v) = (0, 0)``.
        
        >>> args = (2, 3, 7, 8, 9, 10)  # parameter values
        >>> def f(x, *args):
        ...     u, v = x
        ...     a, b, c, d, e, f = args
        ...     return a*u**2 + b*u*v + c*v**2 + d*u + e*v + f
        >>> def gradf(x, *args):
        ...     u, v = x
        ...     a, b, c, d, e, f = args
        ...     gu = 2*a*u + b*v + d     # u-component of the gradient
        ...     gv = b*u + 2*c*v + e     # v-component of the gradient
        ...     return np.asarray((gu, gv))
        >>> x0 = np.asarray((0, 0))  # Initial guess.
        >>> from scipy import optimize
        >>> res1 = optimize.fmin_cg(f, x0, fprime=gradf, args=args)
        Optimization terminated successfully.
                 Current function value: 1.617021
                 Iterations: 4
                 Function evaluations: 8
                 Gradient evaluations: 8
        >>> res1
        array([-1.80851064, -0.25531915])
        
        Example 2: solve the same problem using the `minimize` function.
        (This `myopts` dictionary shows all of the available options,
        although in practice only non-default values would be needed.
        The returned value will be a dictionary.)
        
        >>> opts = {'maxiter' : None,    # default value.
        ...         'disp' : True,    # non-default value.
        ...         'gtol' : 1e-5,    # default value.
        ...         'norm' : np.inf,  # default value.
        ...         'eps' : 1.4901161193847656e-08}  # default value.
        >>> res2 = optimize.minimize(f, x0, jac=gradf, args=args,
        ...                          method='CG', options=opts)
        Optimization terminated successfully.
                Current function value: 1.617021
                Iterations: 4
                Function evaluations: 8
                Gradient evaluations: 8
        >>> res2.x  # minimum found
        array([-1.80851064, -0.25531915])
    
    fmin_cobyla(func, x0, cons, args=(), consargs=None, rhobeg=1.0, rhoend=0.0001, iprint=1, maxfun=1000, disp=None, catol=0.0002)
        Minimize a function using the Constrained Optimization BY Linear
        Approximation (COBYLA) method. This method wraps a FORTRAN
        implementation of the algorithm.
        
        Parameters
        ----------
        func : callable
            Function to minimize. In the form func(x, \*args).
        x0 : ndarray
            Initial guess.
        cons : sequence
            Constraint functions; must all be ``>=0`` (a single function
            if only 1 constraint). Each function takes the parameters `x`
            as its first argument, and it can return either a single number or
            an array or list of numbers.
        args : tuple, optional
            Extra arguments to pass to function.
        consargs : tuple, optional
            Extra arguments to pass to constraint functions (default of None means
            use same extra arguments as those passed to func).
            Use ``()`` for no extra arguments.
        rhobeg : float, optional
            Reasonable initial changes to the variables.
        rhoend : float, optional
            Final accuracy in the optimization (not precisely guaranteed). This
            is a lower bound on the size of the trust region.
        iprint : {0, 1, 2, 3}, optional
            Controls the frequency of output; 0 implies no output.  Deprecated.
        disp : {0, 1, 2, 3}, optional
            Over-rides the iprint interface.  Preferred.
        maxfun : int, optional
            Maximum number of function evaluations.
        catol : float, optional
            Absolute tolerance for constraint violations.
        
        Returns
        -------
        x : ndarray
            The argument that minimises `f`.
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'COBYLA' `method` in particular.
        
        Notes
        -----
        This algorithm is based on linear approximations to the objective
        function and each constraint. We briefly describe the algorithm.
        
        Suppose the function is being minimized over k variables. At the
        jth iteration the algorithm has k+1 points v_1, ..., v_(k+1),
        an approximate solution x_j, and a radius RHO_j.
        (i.e. linear plus a constant) approximations to the objective
        function and constraint functions such that their function values
        agree with the linear approximation on the k+1 points v_1,.., v_(k+1).
        This gives a linear program to solve (where the linear approximations
        of the constraint functions are constrained to be non-negative).
        
        However the linear approximations are likely only good
        approximations near the current simplex, so the linear program is
        given the further requirement that the solution, which
        will become x_(j+1), must be within RHO_j from x_j. RHO_j only
        decreases, never increases. The initial RHO_j is rhobeg and the
        final RHO_j is rhoend. In this way COBYLA's iterations behave
        like a trust region algorithm.
        
        Additionally, the linear program may be inconsistent, or the
        approximation may give poor improvement. For details about
        how these issues are resolved, as well as how the points v_i are
        updated, refer to the source code or the references below.
        
        
        References
        ----------
        Powell M.J.D. (1994), "A direct search optimization method that models
        the objective and constraint functions by linear interpolation.", in
        Advances in Optimization and Numerical Analysis, eds. S. Gomez and
        J-P Hennart, Kluwer Academic (Dordrecht), pp. 51-67
        
        Powell M.J.D. (1998), "Direct search algorithms for optimization
        calculations", Acta Numerica 7, 287-336
        
        Powell M.J.D. (2007), "A view of algorithms for optimization without
        derivatives", Cambridge University Technical Report DAMTP 2007/NA03
        
        
        Examples
        --------
        Minimize the objective function f(x,y) = x*y subject
        to the constraints x**2 + y**2 < 1 and y > 0::
        
            >>> def objective(x):
            ...     return x[0]*x[1]
            ...
            >>> def constr1(x):
            ...     return 1 - (x[0]**2 + x[1]**2)
            ...
            >>> def constr2(x):
            ...     return x[1]
            ...
            >>> from scipy.optimize import fmin_cobyla
            >>> fmin_cobyla(objective, [0.0, 0.1], [constr1, constr2], rhoend=1e-7)
            array([-0.70710685,  0.70710671])
        
        The exact solution is (-sqrt(2)/2, sqrt(2)/2).
    
    fmin_l_bfgs_b(func, x0, fprime=None, args=(), approx_grad=0, bounds=None, m=10, factr=10000000.0, pgtol=1e-05, epsilon=1e-08, iprint=-1, maxfun=15000, maxiter=15000, disp=None, callback=None, maxls=20)
        Minimize a function func using the L-BFGS-B algorithm.
        
        Parameters
        ----------
        func : callable f(x,*args)
            Function to minimise.
        x0 : ndarray
            Initial guess.
        fprime : callable fprime(x,*args), optional
            The gradient of `func`.  If None, then `func` returns the function
            value and the gradient (``f, g = func(x, *args)``), unless
            `approx_grad` is True in which case `func` returns only ``f``.
        args : sequence, optional
            Arguments to pass to `func` and `fprime`.
        approx_grad : bool, optional
            Whether to approximate the gradient numerically (in which case
            `func` returns only the function value).
        bounds : list, optional
            ``(min, max)`` pairs for each element in ``x``, defining
            the bounds on that parameter. Use None or +-inf for one of ``min`` or
            ``max`` when there is no bound in that direction.
        m : int, optional
            The maximum number of variable metric corrections
            used to define the limited memory matrix. (The limited memory BFGS
            method does not store the full hessian but uses this many terms in an
            approximation to it.)
        factr : float, optional
            The iteration stops when
            ``(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps``,
            where ``eps`` is the machine precision, which is automatically
            generated by the code. Typical values for `factr` are: 1e12 for
            low accuracy; 1e7 for moderate accuracy; 10.0 for extremely
            high accuracy.
        pgtol : float, optional
            The iteration will stop when
            ``max{|proj g_i | i = 1, ..., n} <= pgtol``
            where ``pg_i`` is the i-th component of the projected gradient.
        epsilon : float, optional
            Step size used when `approx_grad` is True, for numerically
            calculating the gradient
        iprint : int, optional
            Controls the frequency of output. ``iprint < 0`` means no output;
            ``iprint = 0``    print only one line at the last iteration;
            ``0 < iprint < 99`` print also f and ``|proj g|`` every iprint iterations;
            ``iprint = 99``   print details of every iteration except n-vectors;
            ``iprint = 100``  print also the changes of active set and final x;
            ``iprint > 100``  print details of every iteration including x and g.
        disp : int, optional
            If zero, then no output.  If a positive number, then this over-rides
            `iprint` (i.e., `iprint` gets the value of `disp`).
        maxfun : int, optional
            Maximum number of function evaluations.
        maxiter : int, optional
            Maximum number of iterations.
        callback : callable, optional
            Called after each iteration, as ``callback(xk)``, where ``xk`` is the
            current parameter vector.
        maxls : int, optional
            Maximum number of line search steps (per iteration). Default is 20.
        
        Returns
        -------
        x : array_like
            Estimated position of the minimum.
        f : float
            Value of `func` at the minimum.
        d : dict
            Information dictionary.
        
            * d['warnflag'] is
        
              - 0 if converged,
              - 1 if too many function evaluations or too many iterations,
              - 2 if stopped for another reason, given in d['task']
        
            * d['grad'] is the gradient at the minimum (should be 0 ish)
            * d['funcalls'] is the number of function calls made.
            * d['nit'] is the number of iterations.
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'L-BFGS-B' `method` in particular.
        
        Notes
        -----
        License of L-BFGS-B (FORTRAN code):
        
        The version included here (in fortran code) is 3.0
        (released April 25, 2011).  It was written by Ciyou Zhu, Richard Byrd,
        and Jorge Nocedal <nocedal@ece.nwu.edu>. It carries the following
        condition for use:
        
        This software is freely available, but we expect that all publications
        describing work using this software, or all commercial products using it,
        quote at least one of the references given below. This software is released
        under the BSD License.
        
        References
        ----------
        * R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound
          Constrained Optimization, (1995), SIAM Journal on Scientific and
          Statistical Computing, 16, 5, pp. 1190-1208.
        * C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B,
          FORTRAN routines for large scale bound constrained optimization (1997),
          ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.
        * J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B,
          FORTRAN routines for large scale bound constrained optimization (2011),
          ACM Transactions on Mathematical Software, 38, 1.
    
    fmin_ncg(f, x0, fprime, fhess_p=None, fhess=None, args=(), avextol=1e-05, epsilon=1.4901161193847656e-08, maxiter=None, full_output=0, disp=1, retall=0, callback=None)
        Unconstrained minimization of a function using the Newton-CG method.
        
        Parameters
        ----------
        f : callable ``f(x, *args)``
            Objective function to be minimized.
        x0 : ndarray
            Initial guess.
        fprime : callable ``f'(x, *args)``
            Gradient of f.
        fhess_p : callable ``fhess_p(x, p, *args)``, optional
            Function which computes the Hessian of f times an
            arbitrary vector, p.
        fhess : callable ``fhess(x, *args)``, optional
            Function to compute the Hessian matrix of f.
        args : tuple, optional
            Extra arguments passed to f, fprime, fhess_p, and fhess
            (the same set of extra arguments is supplied to all of
            these functions).
        epsilon : float or ndarray, optional
            If fhess is approximated, use this value for the step size.
        callback : callable, optional
            An optional user-supplied function which is called after
            each iteration.  Called as callback(xk), where xk is the
            current parameter vector.
        avextol : float, optional
            Convergence is assumed when the average relative error in
            the minimizer falls below this amount.
        maxiter : int, optional
            Maximum number of iterations to perform.
        full_output : bool, optional
            If True, return the optional outputs.
        disp : bool, optional
            If True, print convergence message.
        retall : bool, optional
            If True, return a list of results at each iteration.
        
        Returns
        -------
        xopt : ndarray
            Parameters which minimize f, i.e. ``f(xopt) == fopt``.
        fopt : float
            Value of the function at xopt, i.e. ``fopt = f(xopt)``.
        fcalls : int
            Number of function calls made.
        gcalls : int
            Number of gradient calls made.
        hcalls : int
            Number of hessian calls made.
        warnflag : int
            Warnings generated by the algorithm.
            1 : Maximum number of iterations exceeded.
        allvecs : list
            The result at each iteration, if retall is True (see below).
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'Newton-CG' `method` in particular.
        
        Notes
        -----
        Only one of `fhess_p` or `fhess` need to be given.  If `fhess`
        is provided, then `fhess_p` will be ignored.  If neither `fhess`
        nor `fhess_p` is provided, then the hessian product will be
        approximated using finite differences on `fprime`. `fhess_p`
        must compute the hessian times an arbitrary vector. If it is not
        given, finite-differences on `fprime` are used to compute
        it.
        
        Newton-CG methods are also called truncated Newton methods. This
        function differs from scipy.optimize.fmin_tnc because
        
        1. scipy.optimize.fmin_ncg is written purely in python using numpy
            and scipy while scipy.optimize.fmin_tnc calls a C function.
        2. scipy.optimize.fmin_ncg is only for unconstrained minimization
            while scipy.optimize.fmin_tnc is for unconstrained minimization
            or box constrained minimization. (Box constraints give
            lower and upper bounds for each variable separately.)
        
        References
        ----------
        Wright & Nocedal, 'Numerical Optimization', 1999, pg. 140.
    
    fmin_powell(func, x0, args=(), xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None, direc=None)
        Minimize a function using modified Powell's method. This method
        only uses function values, not derivatives.
        
        Parameters
        ----------
        func : callable f(x,*args)
            Objective function to be minimized.
        x0 : ndarray
            Initial guess.
        args : tuple, optional
            Extra arguments passed to func.
        callback : callable, optional
            An optional user-supplied function, called after each
            iteration.  Called as ``callback(xk)``, where ``xk`` is the
            current parameter vector.
        direc : ndarray, optional
            Initial direction set.
        xtol : float, optional
            Line-search error tolerance.
        ftol : float, optional
            Relative error in ``func(xopt)`` acceptable for convergence.
        maxiter : int, optional
            Maximum number of iterations to perform.
        maxfun : int, optional
            Maximum number of function evaluations to make.
        full_output : bool, optional
            If True, fopt, xi, direc, iter, funcalls, and
            warnflag are returned.
        disp : bool, optional
            If True, print convergence messages.
        retall : bool, optional
            If True, return a list of the solution at each iteration.
        
        Returns
        -------
        xopt : ndarray
            Parameter which minimizes `func`.
        fopt : number
            Value of function at minimum: ``fopt = func(xopt)``.
        direc : ndarray
            Current direction set.
        iter : int
            Number of iterations.
        funcalls : int
            Number of function calls made.
        warnflag : int
            Integer warning flag:
                1 : Maximum number of function evaluations.
                2 : Maximum number of iterations.
        allvecs : list
            List of solutions at each iteration.
        
        See also
        --------
        minimize: Interface to unconstrained minimization algorithms for
            multivariate functions. See the 'Powell' `method` in particular.
        
        Notes
        -----
        Uses a modification of Powell's method to find the minimum of
        a function of N variables. Powell's method is a conjugate
        direction method.
        
        The algorithm has two loops. The outer loop
        merely iterates over the inner loop. The inner loop minimizes
        over each current direction in the direction set. At the end
        of the inner loop, if certain conditions are met, the direction
        that gave the largest decrease is dropped and replaced with
        the difference between the current estimated x and the estimated
        x from the beginning of the inner-loop.
        
        The technical conditions for replacing the direction of greatest
        increase amount to checking that
        
        1. No further gain can be made along the direction of greatest increase
           from that iteration.
        2. The direction of greatest increase accounted for a large sufficient
           fraction of the decrease in the function value from that iteration of
           the inner loop.
        
        References
        ----------
        Powell M.J.D. (1964) An efficient method for finding the minimum of a
        function of several variables without calculating derivatives,
        Computer Journal, 7 (2):155-162.
        
        Press W., Teukolsky S.A., Vetterling W.T., and Flannery B.P.:
        Numerical Recipes (any edition), Cambridge University Press
    
    fmin_slsqp(func, x0, eqcons=(), f_eqcons=None, ieqcons=(), f_ieqcons=None, bounds=(), fprime=None, fprime_eqcons=None, fprime_ieqcons=None, args=(), iter=100, acc=1e-06, iprint=1, disp=None, full_output=0, epsilon=1.4901161193847656e-08, callback=None)
        Minimize a function using Sequential Least SQuares Programming
        
        Python interface function for the SLSQP Optimization subroutine
        originally implemented by Dieter Kraft.
        
        Parameters
        ----------
        func : callable f(x,*args)
            Objective function.
        x0 : 1-D ndarray of float
            Initial guess for the independent variable(s).
        eqcons : list, optional
            A list of functions of length n such that
            eqcons[j](x,*args) == 0.0 in a successfully optimized
            problem.
        f_eqcons : callable f(x,*args), optional
            Returns a 1-D array in which each element must equal 0.0 in a
            successfully optimized problem.  If f_eqcons is specified,
            eqcons is ignored.
        ieqcons : list, optional
            A list of functions of length n such that
            ieqcons[j](x,*args) >= 0.0 in a successfully optimized
            problem.
        f_ieqcons : callable f(x,*args), optional
            Returns a 1-D ndarray in which each element must be greater or
            equal to 0.0 in a successfully optimized problem.  If
            f_ieqcons is specified, ieqcons is ignored.
        bounds : list, optional
            A list of tuples specifying the lower and upper bound
            for each independent variable [(xl0, xu0),(xl1, xu1),...]
            Infinite values will be interpreted as large floating values.
        fprime : callable `f(x,*args)`, optional
            A function that evaluates the partial derivatives of func.
        fprime_eqcons : callable `f(x,*args)`, optional
            A function of the form `f(x, *args)` that returns the m by n
            array of equality constraint normals.  If not provided,
            the normals will be approximated. The array returned by
            fprime_eqcons should be sized as ( len(eqcons), len(x0) ).
        fprime_ieqcons : callable `f(x,*args)`, optional
            A function of the form `f(x, *args)` that returns the m by n
            array of inequality constraint normals.  If not provided,
            the normals will be approximated. The array returned by
            fprime_ieqcons should be sized as ( len(ieqcons), len(x0) ).
        args : sequence, optional
            Additional arguments passed to func and fprime.
        iter : int, optional
            The maximum number of iterations.
        acc : float, optional
            Requested accuracy.
        iprint : int, optional
            The verbosity of fmin_slsqp :
        
            * iprint <= 0 : Silent operation
            * iprint == 1 : Print summary upon completion (default)
            * iprint >= 2 : Print status of each iterate and summary
        disp : int, optional
            Over-rides the iprint interface (preferred).
        full_output : bool, optional
            If False, return only the minimizer of func (default).
            Otherwise, output final objective function and summary
            information.
        epsilon : float, optional
            The step size for finite-difference derivative estimates.
        callback : callable, optional
            Called after each iteration, as ``callback(x)``, where ``x`` is the
            current parameter vector.
        
        Returns
        -------
        out : ndarray of float
            The final minimizer of func.
        fx : ndarray of float, if full_output is true
            The final value of the objective function.
        its : int, if full_output is true
            The number of iterations.
        imode : int, if full_output is true
            The exit mode from the optimizer (see below).
        smode : string, if full_output is true
            Message describing the exit mode from the optimizer.
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'SLSQP' `method` in particular.
        
        Notes
        -----
        Exit modes are defined as follows ::
        
            -1 : Gradient evaluation required (g & a)
             0 : Optimization terminated successfully.
             1 : Function evaluation required (f & c)
             2 : More equality constraints than independent variables
             3 : More than 3*n iterations in LSQ subproblem
             4 : Inequality constraints incompatible
             5 : Singular matrix E in LSQ subproblem
             6 : Singular matrix C in LSQ subproblem
             7 : Rank-deficient equality constraint subproblem HFTI
             8 : Positive directional derivative for linesearch
             9 : Iteration limit exceeded
        
        Examples
        --------
        Examples are given :ref:`in the tutorial <tutorial-sqlsp>`.
    
    fmin_tnc(func, x0, fprime=None, args=(), approx_grad=0, bounds=None, epsilon=1e-08, scale=None, offset=None, messages=15, maxCGit=-1, maxfun=None, eta=-1, stepmx=0, accuracy=0, fmin=0, ftol=-1, xtol=-1, pgtol=-1, rescale=-1, disp=None, callback=None)
        Minimize a function with variables subject to bounds, using
        gradient information in a truncated Newton algorithm. This
        method wraps a C implementation of the algorithm.
        
        Parameters
        ----------
        func : callable ``func(x, *args)``
            Function to minimize.  Must do one of:
        
            1. Return f and g, where f is the value of the function and g its
               gradient (a list of floats).
        
            2. Return the function value but supply gradient function
               separately as `fprime`.
        
            3. Return the function value and set ``approx_grad=True``.
        
            If the function returns None, the minimization
            is aborted.
        x0 : array_like
            Initial estimate of minimum.
        fprime : callable ``fprime(x, *args)``, optional
            Gradient of `func`. If None, then either `func` must return the
            function value and the gradient (``f,g = func(x, *args)``)
            or `approx_grad` must be True.
        args : tuple, optional
            Arguments to pass to function.
        approx_grad : bool, optional
            If true, approximate the gradient numerically.
        bounds : list, optional
            (min, max) pairs for each element in x0, defining the
            bounds on that parameter. Use None or +/-inf for one of
            min or max when there is no bound in that direction.
        epsilon : float, optional
            Used if approx_grad is True. The stepsize in a finite
            difference approximation for fprime.
        scale : array_like, optional
            Scaling factors to apply to each variable.  If None, the
            factors are up-low for interval bounded variables and
            1+|x| for the others.  Defaults to None.
        offset : array_like, optional
            Value to subtract from each variable.  If None, the
            offsets are (up+low)/2 for interval bounded variables
            and x for the others.
        messages : int, optional
            Bit mask used to select messages display during
            minimization values defined in the MSGS dict.  Defaults to
            MGS_ALL.
        disp : int, optional
            Integer interface to messages.  0 = no message, 5 = all messages
        maxCGit : int, optional
            Maximum number of hessian*vector evaluations per main
            iteration.  If maxCGit == 0, the direction chosen is
            -gradient if maxCGit < 0, maxCGit is set to
            max(1,min(50,n/2)).  Defaults to -1.
        maxfun : int, optional
            Maximum number of function evaluation.  if None, maxfun is
            set to max(100, 10*len(x0)).  Defaults to None.
        eta : float, optional
            Severity of the line search. if < 0 or > 1, set to 0.25.
            Defaults to -1.
        stepmx : float, optional
            Maximum step for the line search.  May be increased during
            call.  If too small, it will be set to 10.0.  Defaults to 0.
        accuracy : float, optional
            Relative precision for finite difference calculations.  If
            <= machine_precision, set to sqrt(machine_precision).
            Defaults to 0.
        fmin : float, optional
            Minimum function value estimate.  Defaults to 0.
        ftol : float, optional
            Precision goal for the value of f in the stoping criterion.
            If ftol < 0.0, ftol is set to 0.0 defaults to -1.
        xtol : float, optional
            Precision goal for the value of x in the stopping
            criterion (after applying x scaling factors).  If xtol <
            0.0, xtol is set to sqrt(machine_precision).  Defaults to
            -1.
        pgtol : float, optional
            Precision goal for the value of the projected gradient in
            the stopping criterion (after applying x scaling factors).
            If pgtol < 0.0, pgtol is set to 1e-2 * sqrt(accuracy).
            Setting it to 0.0 is not recommended.  Defaults to -1.
        rescale : float, optional
            Scaling factor (in log10) used to trigger f value
            rescaling.  If 0, rescale at each iteration.  If a large
            value, never rescale.  If < 0, rescale is set to 1.3.
        callback : callable, optional
            Called after each iteration, as callback(xk), where xk is the
            current parameter vector.
        
        Returns
        -------
        x : ndarray
            The solution.
        nfeval : int
            The number of function evaluations.
        rc : int
            Return code, see below
        
        See also
        --------
        minimize: Interface to minimization algorithms for multivariate
            functions. See the 'TNC' `method` in particular.
        
        Notes
        -----
        The underlying algorithm is truncated Newton, also called
        Newton Conjugate-Gradient. This method differs from
        scipy.optimize.fmin_ncg in that
        
        1. It wraps a C implementation of the algorithm
        2. It allows each variable to be given an upper and lower bound.
        
        The algorithm incoporates the bound constraints by determining
        the descent direction as in an unconstrained truncated Newton,
        but never taking a step-size large enough to leave the space
        of feasible x's. The algorithm keeps track of a set of
        currently active constraints, and ignores them when computing
        the minimum allowable step size. (The x's associated with the
        active constraint are kept fixed.) If the maximum allowable
        step size is zero then a new constraint is added. At the end
        of each iteration one of the constraints may be deemed no
        longer active and removed. A constraint is considered
        no longer active is if it is currently active
        but the gradient for that variable points inward from the
        constraint. The specific constraint removed is the one
        associated with the variable of largest index whose
        constraint is no longer active.
        
        Return codes are defined as follows::
        
            -1 : Infeasible (lower bound > upper bound)
             0 : Local minimum reached (|pg| ~= 0)
             1 : Converged (|f_n-f_(n-1)| ~= 0)
             2 : Converged (|x_n-x_(n-1)| ~= 0)
             3 : Max. number of function evaluations reached
             4 : Linear search failed
             5 : All lower bounds are equal to the upper bounds
             6 : Unable to progress
             7 : User requested end of minimization
        
        References
        ----------
        Wright S., Nocedal J. (2006), 'Numerical Optimization'
        
        Nash S.G. (1984), "Newton-Type Minimization Via the Lanczos Method",
        SIAM Journal of Numerical Analysis 21, pp. 770-778
    
    fminbound(func, x1, x2, args=(), xtol=1e-05, maxfun=500, full_output=0, disp=1)
        Bounded minimization for scalar functions.
        
        Parameters
        ----------
        func : callable f(x,*args)
            Objective function to be minimized (must accept and return scalars).
        x1, x2 : float or array scalar
            The optimization bounds.
        args : tuple, optional
            Extra arguments passed to function.
        xtol : float, optional
            The convergence tolerance.
        maxfun : int, optional
            Maximum number of function evaluations allowed.
        full_output : bool, optional
            If True, return optional outputs.
        disp : int, optional
            If non-zero, print messages.
                0 : no message printing.
                1 : non-convergence notification messages only.
                2 : print a message on convergence too.
                3 : print iteration results.
        
        
        Returns
        -------
        xopt : ndarray
            Parameters (over given interval) which minimize the
            objective function.
        fval : number
            The function value at the minimum point.
        ierr : int
            An error flag (0 if converged, 1 if maximum number of
            function calls reached).
        numfunc : int
          The number of function calls made.
        
        See also
        --------
        minimize_scalar: Interface to minimization algorithms for scalar
            univariate functions. See the 'Bounded' `method` in particular.
        
        Notes
        -----
        Finds a local minimizer of the scalar function `func` in the
        interval x1 < xopt < x2 using Brent's method.  (See `brent`
        for auto-bracketing).
    
    fsolve(func, x0, args=(), fprime=None, full_output=0, col_deriv=0, xtol=1.49012e-08, maxfev=0, band=None, epsfcn=None, factor=100, diag=None)
        Find the roots of a function.
        
        Return the roots of the (non-linear) equations defined by
        ``func(x) = 0`` given a starting estimate.
        
        Parameters
        ----------
        func : callable ``f(x, *args)``
            A function that takes at least one (possibly vector) argument.
        x0 : ndarray
            The starting estimate for the roots of ``func(x) = 0``.
        args : tuple, optional
            Any extra arguments to `func`.
        fprime : callable(x), optional
            A function to compute the Jacobian of `func` with derivatives
            across the rows. By default, the Jacobian will be estimated.
        full_output : bool, optional
            If True, return optional outputs.
        col_deriv : bool, optional
            Specify whether the Jacobian function computes derivatives down
            the columns (faster, because there is no transpose operation).
        xtol : float, optional
            The calculation will terminate if the relative error between two
            consecutive iterates is at most `xtol`.
        maxfev : int, optional
            The maximum number of calls to the function. If zero, then
            ``100*(N+1)`` is the maximum where N is the number of elements
            in `x0`.
        band : tuple, optional
            If set to a two-sequence containing the number of sub- and
            super-diagonals within the band of the Jacobi matrix, the
            Jacobi matrix is considered banded (only for ``fprime=None``).
        epsfcn : float, optional
            A suitable step length for the forward-difference
            approximation of the Jacobian (for ``fprime=None``). If
            `epsfcn` is less than the machine precision, it is assumed
            that the relative errors in the functions are of the order of
            the machine precision.
        factor : float, optional
            A parameter determining the initial step bound
            (``factor * || diag * x||``).  Should be in the interval
            ``(0.1, 100)``.
        diag : sequence, optional
            N positive entries that serve as a scale factors for the
            variables.
        
        Returns
        -------
        x : ndarray
            The solution (or the result of the last iteration for
            an unsuccessful call).
        infodict : dict
            A dictionary of optional outputs with the keys:
        
            ``nfev``
                number of function calls
            ``njev``
                number of Jacobian calls
            ``fvec``
                function evaluated at the output
            ``fjac``
                the orthogonal matrix, q, produced by the QR
                factorization of the final approximate Jacobian
                matrix, stored column wise
            ``r``
                upper triangular matrix produced by QR factorization
                of the same matrix
            ``qtf``
                the vector ``(transpose(q) * fvec)``
        
        ier : int
            An integer flag.  Set to 1 if a solution was found, otherwise refer
            to `mesg` for more information.
        mesg : str
            If no solution is found, `mesg` details the cause of failure.
        
        See Also
        --------
        root : Interface to root finding algorithms for multivariate
        functions. See the 'hybr' `method` in particular.
        
        Notes
        -----
        ``fsolve`` is a wrapper around MINPACK's hybrd and hybrj algorithms.
    
    golden(func, args=(), brack=None, tol=1.4901161193847656e-08, full_output=0)
        Return the minimum of a function of one variable.
        
        Given a function of one variable and a possible bracketing interval,
        return the minimum of the function isolated to a fractional precision of
        tol.
        
        Parameters
        ----------
        func : callable func(x,*args)
            Objective function to minimize.
        args : tuple, optional
            Additional arguments (if present), passed to func.
        brack : tuple, optional
            Triple (a,b,c), where (a<b<c) and func(b) <
            func(a),func(c).  If bracket consists of two numbers (a,
            c), then they are assumed to be a starting interval for a
            downhill bracket search (see `bracket`); it doesn't always
            mean that obtained solution will satisfy a<=x<=c.
        tol : float, optional
            x tolerance stop criterion
        full_output : bool, optional
            If True, return optional outputs.
        
        See also
        --------
        minimize_scalar: Interface to minimization algorithms for scalar
            univariate functions. See the 'Golden' `method` in particular.
        
        Notes
        -----
        Uses analog of bisection method to decrease the bracketed
        interval.
    
    least_squares(fun, x0, jac='2-point', bounds=(-inf, inf), method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, x_scale=1.0, loss='linear', f_scale=1.0, diff_step=None, tr_solver=None, tr_options={}, jac_sparsity=None, max_nfev=None, verbose=0, args=(), kwargs={})
        Solve a nonlinear least-squares problem with bounds on the variables.
        
        Given the residuals f(x) (an m-dimensional function of n variables) and
        the loss function rho(s) (a scalar function), `least_squares` finds a
        local minimum of the cost function F(x)::
        
            minimize F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
            subject to lb <= x <= ub
        
        The purpose of the loss function rho(s) is to reduce the influence of
        outliers on the solution.
        
        Parameters
        ----------
        fun : callable
            Function which computes the vector of residuals, with the signature
            ``fun(x, *args, **kwargs)``, i.e., the minimization proceeds with
            respect to its first argument. The argument ``x`` passed to this
            function is an ndarray of shape (n,) (never a scalar, even for n=1).
            It must return a 1-d array_like of shape (m,) or a scalar.
        x0 : array_like with shape (n,) or float
            Initial guess on independent variables. If float, it will be treated
            as a 1-d array with one element.
        jac : {'2-point', '3-point', 'cs', callable}, optional
            Method of computing the Jacobian matrix (an m-by-n matrix, where
            element (i, j) is the partial derivative of f[i] with respect to
            x[j]). The keywords select a finite difference scheme for numerical
            estimation. The scheme '3-point' is more accurate, but requires
            twice as much operations compared to '2-point' (default). The
            scheme 'cs' uses complex steps, and while potentially the most
            accurate, it is applicable only when `fun` correctly handles
            complex inputs and can be analytically continued to the complex
            plane. Method 'lm' always uses the '2-point' scheme. If callable,
            it is used as ``jac(x, *args, **kwargs)`` and should return a
            good approximation (or the exact value) for the Jacobian as an
            array_like (np.atleast_2d is applied), a sparse matrix or a
            `scipy.sparse.linalg.LinearOperator`.
        bounds : 2-tuple of array_like, optional
            Lower and upper bounds on independent variables. Defaults to no bounds.
            Each array must match the size of `x0` or be a scalar, in the latter
            case a bound will be the same for all variables. Use ``np.inf`` with
            an appropriate sign to disable bounds on all or some variables.
        method : {'trf', 'dogbox', 'lm'}, optional
            Algorithm to perform minimization.
        
                * 'trf' : Trust Region Reflective algorithm, particularly suitable
                  for large sparse problems with bounds. Generally robust method.
                * 'dogbox' : dogleg algorithm with rectangular trust regions,
                  typical use case is small problems with bounds. Not recommended
                  for problems with rank-deficient Jacobian.
                * 'lm' : Levenberg-Marquardt algorithm as implemented in MINPACK.
                  Doesn't handle bounds and sparse Jacobians. Usually the most
                  efficient method for small unconstrained problems.
        
            Default is 'trf'. See Notes for more information.
        ftol : float, optional
            Tolerance for termination by the change of the cost function. Default
            is 1e-8. The optimization process is stopped when  ``dF < ftol * F``,
            and there was an adequate agreement between a local quadratic model and
            the true model in the last step.
        xtol : float, optional
            Tolerance for termination by the change of the independent variables.
            Default is 1e-8. The exact condition depends on the `method` used:
        
                * For 'trf' and 'dogbox' : ``norm(dx) < xtol * (xtol + norm(x))``
                * For 'lm' : ``Delta < xtol * norm(xs)``, where ``Delta`` is
                  a trust-region radius and ``xs`` is the value of ``x``
                  scaled according to `x_scale` parameter (see below).
        
        gtol : float, optional
            Tolerance for termination by the norm of the gradient. Default is 1e-8.
            The exact condition depends on a `method` used:
        
                * For 'trf' : ``norm(g_scaled, ord=np.inf) < gtol``, where
                  ``g_scaled`` is the value of the gradient scaled to account for
                  the presence of the bounds [STIR]_.
                * For 'dogbox' : ``norm(g_free, ord=np.inf) < gtol``, where
                  ``g_free`` is the gradient with respect to the variables which
                  are not in the optimal state on the boundary.
                * For 'lm' : the maximum absolute value of the cosine of angles
                  between columns of the Jacobian and the residual vector is less
                  than `gtol`, or the residual vector is zero.
        
        x_scale : array_like or 'jac', optional
            Characteristic scale of each variable. Setting `x_scale` is equivalent
            to reformulating the problem in scaled variables ``xs = x / x_scale``.
            An alternative view is that the size of a trust region along j-th
            dimension is proportional to ``x_scale[j]``. Improved convergence may
            be achieved by setting `x_scale` such that a step of a given size
            along any of the scaled variables has a similar effect on the cost
            function. If set to 'jac', the scale is iteratively updated using the
            inverse norms of the columns of the Jacobian matrix (as described in
            [JJMore]_).
        loss : str or callable, optional
            Determines the loss function. The following keyword values are allowed:
        
                * 'linear' (default) : ``rho(z) = z``. Gives a standard
                  least-squares problem.
                * 'soft_l1' : ``rho(z) = 2 * ((1 + z)**0.5 - 1)``. The smooth
                  approximation of l1 (absolute value) loss. Usually a good
                  choice for robust least squares.
                * 'huber' : ``rho(z) = z if z <= 1 else 2*z**0.5 - 1``. Works
                  similarly to 'soft_l1'.
                * 'cauchy' : ``rho(z) = ln(1 + z)``. Severely weakens outliers
                  influence, but may cause difficulties in optimization process.
                * 'arctan' : ``rho(z) = arctan(z)``. Limits a maximum loss on
                  a single residual, has properties similar to 'cauchy'.
        
            If callable, it must take a 1-d ndarray ``z=f**2`` and return an
            array_like with shape (3, m) where row 0 contains function values,
            row 1 contains first derivatives and row 2 contains second
            derivatives. Method 'lm' supports only 'linear' loss.
        f_scale : float, optional
            Value of soft margin between inlier and outlier residuals, default
            is 1.0. The loss function is evaluated as follows
            ``rho_(f**2) = C**2 * rho(f**2 / C**2)``, where ``C`` is `f_scale`,
            and ``rho`` is determined by `loss` parameter. This parameter has
            no effect with ``loss='linear'``, but for other `loss` values it is
            of crucial importance.
        max_nfev : None or int, optional
            Maximum number of function evaluations before the termination.
            If None (default), the value is chosen automatically:
        
                * For 'trf' and 'dogbox' : 100 * n.
                * For 'lm' :  100 * n if `jac` is callable and 100 * n * (n + 1)
                  otherwise (because 'lm' counts function calls in Jacobian
                  estimation).
        
        diff_step : None or array_like, optional
            Determines the relative step size for the finite difference
            approximation of the Jacobian. The actual step is computed as
            ``x * diff_step``. If None (default), then `diff_step` is taken to be
            a conventional "optimal" power of machine epsilon for the finite
            difference scheme used [NR]_.
        tr_solver : {None, 'exact', 'lsmr'}, optional
            Method for solving trust-region subproblems, relevant only for 'trf'
            and 'dogbox' methods.
        
                * 'exact' is suitable for not very large problems with dense
                  Jacobian matrices. The computational complexity per iteration is
                  comparable to a singular value decomposition of the Jacobian
                  matrix.
                * 'lsmr' is suitable for problems with sparse and large Jacobian
                  matrices. It uses the iterative procedure
                  `scipy.sparse.linalg.lsmr` for finding a solution of a linear
                  least-squares problem and only requires matrix-vector product
                  evaluations.
        
            If None (default) the solver is chosen based on the type of Jacobian
            returned on the first iteration.
        tr_options : dict, optional
            Keyword options passed to trust-region solver.
        
                * ``tr_solver='exact'``: `tr_options` are ignored.
                * ``tr_solver='lsmr'``: options for `scipy.sparse.linalg.lsmr`.
                  Additionally  ``method='trf'`` supports  'regularize' option
                  (bool, default is True) which adds a regularization term to the
                  normal equation, which improves convergence if the Jacobian is
                  rank-deficient [Byrd]_ (eq. 3.4).
        
        jac_sparsity : {None, array_like, sparse matrix}, optional
            Defines the sparsity structure of the Jacobian matrix for finite
            difference estimation, its shape must be (m, n). If the Jacobian has
            only few non-zero elements in *each* row, providing the sparsity
            structure will greatly speed up the computations [Curtis]_. A zero
            entry means that a corresponding element in the Jacobian is identically
            zero. If provided, forces the use of 'lsmr' trust-region solver.
            If None (default) then dense differencing will be used. Has no effect
            for 'lm' method.
        verbose : {0, 1, 2}, optional
            Level of algorithm's verbosity:
        
                * 0 (default) : work silently.
                * 1 : display a termination report.
                * 2 : display progress during iterations (not supported by 'lm'
                  method).
        
        args, kwargs : tuple and dict, optional
            Additional arguments passed to `fun` and `jac`. Both empty by default.
            The calling signature is ``fun(x, *args, **kwargs)`` and the same for
            `jac`.
        
        Returns
        -------
        `OptimizeResult` with the following fields defined:
        x : ndarray, shape (n,)
            Solution found.
        cost : float
            Value of the cost function at the solution.
        fun : ndarray, shape (m,)
            Vector of residuals at the solution.
        jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
            Modified Jacobian matrix at the solution, in the sense that J^T J
            is a Gauss-Newton approximation of the Hessian of the cost function.
            The type is the same as the one used by the algorithm.
        grad : ndarray, shape (m,)
            Gradient of the cost function at the solution.
        optimality : float
            First-order optimality measure. In unconstrained problems, it is always
            the uniform norm of the gradient. In constrained problems, it is the
            quantity which was compared with `gtol` during iterations.
        active_mask : ndarray of int, shape (n,)
            Each component shows whether a corresponding constraint is active
            (that is, whether a variable is at the bound):
        
                *  0 : a constraint is not active.
                * -1 : a lower bound is active.
                *  1 : an upper bound is active.
        
            Might be somewhat arbitrary for 'trf' method as it generates a sequence
            of strictly feasible iterates and `active_mask` is determined within a
            tolerance threshold.
        nfev : int
            Number of function evaluations done. Methods 'trf' and 'dogbox' do not
            count function calls for numerical Jacobian approximation, as opposed
            to 'lm' method.
        njev : int or None
            Number of Jacobian evaluations done. If numerical Jacobian
            approximation is used in 'lm' method, it is set to None.
        status : int
            The reason for algorithm termination:
        
                * -1 : improper input parameters status returned from MINPACK.
                *  0 : the maximum number of function evaluations is exceeded.
                *  1 : `gtol` termination condition is satisfied.
                *  2 : `ftol` termination condition is satisfied.
                *  3 : `xtol` termination condition is satisfied.
                *  4 : Both `ftol` and `xtol` termination conditions are satisfied.
        
        message : str
            Verbal description of the termination reason.
        success : bool
            True if one of the convergence criteria is satisfied (`status` > 0).
        
        See Also
        --------
        leastsq : A legacy wrapper for the MINPACK implementation of the
                  Levenberg-Marquadt algorithm.
        curve_fit : Least-squares minimization applied to a curve fitting problem.
        
        Notes
        -----
        Method 'lm' (Levenberg-Marquardt) calls a wrapper over least-squares
        algorithms implemented in MINPACK (lmder, lmdif). It runs the
        Levenberg-Marquardt algorithm formulated as a trust-region type algorithm.
        The implementation is based on paper [JJMore]_, it is very robust and
        efficient with a lot of smart tricks. It should be your first choice
        for unconstrained problems. Note that it doesn't support bounds. Also
        it doesn't work when m < n.
        
        Method 'trf' (Trust Region Reflective) is motivated by the process of
        solving a system of equations, which constitute the first-order optimality
        condition for a bound-constrained minimization problem as formulated in
        [STIR]_. The algorithm iteratively solves trust-region subproblems
        augmented by a special diagonal quadratic term and with trust-region shape
        determined by the distance from the bounds and the direction of the
        gradient. This enhancements help to avoid making steps directly into bounds
        and efficiently explore the whole space of variables. To further improve
        convergence, the algorithm considers search directions reflected from the
        bounds. To obey theoretical requirements, the algorithm keeps iterates
        strictly feasible. With dense Jacobians trust-region subproblems are
        solved by an exact method very similar to the one described in [JJMore]_
        (and implemented in MINPACK). The difference from the MINPACK
        implementation is that a singular value decomposition of a Jacobian
        matrix is done once per iteration, instead of a QR decomposition and series
        of Givens rotation eliminations. For large sparse Jacobians a 2-d subspace
        approach of solving trust-region subproblems is used [STIR]_, [Byrd]_.
        The subspace is spanned by a scaled gradient and an approximate
        Gauss-Newton solution delivered by `scipy.sparse.linalg.lsmr`. When no
        constraints are imposed the algorithm is very similar to MINPACK and has
        generally comparable performance. The algorithm works quite robust in
        unbounded and bounded problems, thus it is chosen as a default algorithm.
        
        Method 'dogbox' operates in a trust-region framework, but considers
        rectangular trust regions as opposed to conventional ellipsoids [Voglis]_.
        The intersection of a current trust region and initial bounds is again
        rectangular, so on each iteration a quadratic minimization problem subject
        to bound constraints is solved approximately by Powell's dogleg method
        [NumOpt]_. The required Gauss-Newton step can be computed exactly for
        dense Jacobians or approximately by `scipy.sparse.linalg.lsmr` for large
        sparse Jacobians. The algorithm is likely to exhibit slow convergence when
        the rank of Jacobian is less than the number of variables. The algorithm
        often outperforms 'trf' in bounded problems with a small number of
        variables.
        
        Robust loss functions are implemented as described in [BA]_. The idea
        is to modify a residual vector and a Jacobian matrix on each iteration
        such that computed gradient and Gauss-Newton Hessian approximation match
        the true gradient and Hessian approximation of the cost function. Then
        the algorithm proceeds in a normal way, i.e. robust loss functions are
        implemented as a simple wrapper over standard least-squares algorithms.
        
        .. versionadded:: 0.17.0
        
        References
        ----------
        .. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
                  and Conjugate Gradient Method for Large-Scale Bound-Constrained
                  Minimization Problems," SIAM Journal on Scientific Computing,
                  Vol. 21, Number 1, pp 1-23, 1999.
        .. [NR] William H. Press et. al., "Numerical Recipes. The Art of Scientific
                Computing. 3rd edition", Sec. 5.7.
        .. [Byrd] R. H. Byrd, R. B. Schnabel and G. A. Shultz, "Approximate
                  solution of the trust region problem by minimization over
                  two-dimensional subspaces", Math. Programming, 40, pp. 247-263,
                  1988.
        .. [Curtis] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
                    sparse Jacobian matrices", Journal of the Institute of
                    Mathematics and its Applications, 13, pp. 117-120, 1974.
        .. [JJMore] J. J. More, "The Levenberg-Marquardt Algorithm: Implementation
                    and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
                    Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
        .. [Voglis] C. Voglis and I. E. Lagaris, "A Rectangular Trust Region
                    Dogleg Approach for Unconstrained and Bound Constrained
                    Nonlinear Optimization", WSEAS International Conference on
                    Applied Mathematics, Corfu, Greece, 2004.
        .. [NumOpt] J. Nocedal and S. J. Wright, "Numerical optimization,
                    2nd edition", Chapter 4.
        .. [BA] B. Triggs et. al., "Bundle Adjustment - A Modern Synthesis",
                Proceedings of the International Workshop on Vision Algorithms:
                Theory and Practice, pp. 298-372, 1999.
        
        Examples
        --------
        In this example we find a minimum of the Rosenbrock function without bounds
        on independed variables.
        
        >>> def fun_rosenbrock(x):
        ...     return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])
        
        Notice that we only provide the vector of the residuals. The algorithm
        constructs the cost function as a sum of squares of the residuals, which
        gives the Rosenbrock function. The exact minimum is at ``x = [1.0, 1.0]``.
        
        >>> from scipy.optimize import least_squares
        >>> x0_rosenbrock = np.array([2, 2])
        >>> res_1 = least_squares(fun_rosenbrock, x0_rosenbrock)
        >>> res_1.x
        array([ 1.,  1.])
        >>> res_1.cost
        9.8669242910846867e-30
        >>> res_1.optimality
        8.8928864934219529e-14
        
        We now constrain the variables, in such a way that the previous solution
        becomes infeasible. Specifically, we require that ``x[1] >= 1.5``, and
        ``x[0]`` left unconstrained. To this end, we specify the `bounds` parameter
        to `least_squares` in the form ``bounds=([-np.inf, 1.5], np.inf)``.
        
        We also provide the analytic Jacobian:
        
        >>> def jac_rosenbrock(x):
        ...     return np.array([
        ...         [-20 * x[0], 10],
        ...         [-1, 0]])
        
        Putting this all together, we see that the new solution lies on the bound:
        
        >>> res_2 = least_squares(fun_rosenbrock, x0_rosenbrock, jac_rosenbrock,
        ...                       bounds=([-np.inf, 1.5], np.inf))
        >>> res_2.x
        array([ 1.22437075,  1.5       ])
        >>> res_2.cost
        0.025213093946805685
        >>> res_2.optimality
        1.5885401433157753e-07
        
        Now we solve a system of equations (i.e., the cost function should be zero
        at a minimum) for a Broyden tridiagonal vector-valued function of 100000
        variables:
        
        >>> def fun_broyden(x):
        ...     f = (3 - x) * x + 1
        ...     f[1:] -= x[:-1]
        ...     f[:-1] -= 2 * x[1:]
        ...     return f
        
        The corresponding Jacobian matrix is sparse. We tell the algorithm to
        estimate it by finite differences and provide the sparsity structure of
        Jacobian to significantly speed up this process.
        
        >>> from scipy.sparse import lil_matrix
        >>> def sparsity_broyden(n):
        ...     sparsity = lil_matrix((n, n), dtype=int)
        ...     i = np.arange(n)
        ...     sparsity[i, i] = 1
        ...     i = np.arange(1, n)
        ...     sparsity[i, i - 1] = 1
        ...     i = np.arange(n - 1)
        ...     sparsity[i, i + 1] = 1
        ...     return sparsity
        ...
        >>> n = 100000
        >>> x0_broyden = -np.ones(n)
        ...
        >>> res_3 = least_squares(fun_broyden, x0_broyden,
        ...                       jac_sparsity=sparsity_broyden(n))
        >>> res_3.cost
        4.5687069299604613e-23
        >>> res_3.optimality
        1.1650454296851518e-11
        
        Let's also solve a curve fitting problem using robust loss function to
        take care of outliers in the data. Define the model function as
        ``y = a + b * exp(c * t)``, where t is a predictor variable, y is an
        observation and a, b, c are parameters to estimate.
        
        First, define the function which generates the data with noise and
        outliers, define the model parameters, and generate data:
        
        >>> def gen_data(t, a, b, c, noise=0, n_outliers=0, random_state=0):
        ...     y = a + b * np.exp(t * c)
        ...
        ...     rnd = np.random.RandomState(random_state)
        ...     error = noise * rnd.randn(t.size)
        ...     outliers = rnd.randint(0, t.size, n_outliers)
        ...     error[outliers] *= 10
        ...
        ...     return y + error
        ...
        >>> a = 0.5
        >>> b = 2.0
        >>> c = -1
        >>> t_min = 0
        >>> t_max = 10
        >>> n_points = 15
        ...
        >>> t_train = np.linspace(t_min, t_max, n_points)
        >>> y_train = gen_data(t_train, a, b, c, noise=0.1, n_outliers=3)
        
        Define function for computing residuals and initial estimate of
        parameters.
        
        >>> def fun(x, t, y):
        ...     return x[0] + x[1] * np.exp(x[2] * t) - y
        ...
        >>> x0 = np.array([1.0, 1.0, 0.0])
        
        Compute a standard least-squares solution:
        
        >>> res_lsq = least_squares(fun, x0, args=(t_train, y_train))
        
        Now compute two solutions with two different robust loss functions. The
        parameter `f_scale` is set to 0.1, meaning that inlier residuals should
        not significantly exceed 0.1 (the noise level used).
        
        >>> res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1,
        ...                             args=(t_train, y_train))
        >>> res_log = least_squares(fun, x0, loss='cauchy', f_scale=0.1,
        ...                         args=(t_train, y_train))
        
        And finally plot all the curves. We see that by selecting an appropriate
        `loss`  we can get estimates close to optimal even in the presence of
        strong outliers. But keep in mind that generally it is recommended to try
        'soft_l1' or 'huber' losses first (if at all necessary) as the other two
        options may cause difficulties in optimization process.
        
        >>> t_test = np.linspace(t_min, t_max, n_points * 10)
        >>> y_true = gen_data(t_test, a, b, c)
        >>> y_lsq = gen_data(t_test, *res_lsq.x)
        >>> y_soft_l1 = gen_data(t_test, *res_soft_l1.x)
        >>> y_log = gen_data(t_test, *res_log.x)
        ...
        >>> import matplotlib.pyplot as plt
        >>> plt.plot(t_train, y_train, 'o')
        >>> plt.plot(t_test, y_true, 'k', linewidth=2, label='true')
        >>> plt.plot(t_test, y_lsq, label='linear loss')
        >>> plt.plot(t_test, y_soft_l1, label='soft_l1 loss')
        >>> plt.plot(t_test, y_log, label='cauchy loss')
        >>> plt.xlabel("t")
        >>> plt.ylabel("y")
        >>> plt.legend()
        >>> plt.show()
    
    leastsq(func, x0, args=(), Dfun=None, full_output=0, col_deriv=0, ftol=1.49012e-08, xtol=1.49012e-08, gtol=0.0, maxfev=0, epsfcn=None, factor=100, diag=None)
        Minimize the sum of squares of a set of equations.
        
        ::
        
            x = arg min(sum(func(y)**2,axis=0))
                     y
        
        Parameters
        ----------
        func : callable
            should take at least one (possibly length N vector) argument and
            returns M floating point numbers. It must not return NaNs or
            fitting might fail.
        x0 : ndarray
            The starting estimate for the minimization.
        args : tuple, optional
            Any extra arguments to func are placed in this tuple.
        Dfun : callable, optional
            A function or method to compute the Jacobian of func with derivatives
            across the rows. If this is None, the Jacobian will be estimated.
        full_output : bool, optional
            non-zero to return all optional outputs.
        col_deriv : bool, optional
            non-zero to specify that the Jacobian function computes derivatives
            down the columns (faster, because there is no transpose operation).
        ftol : float, optional
            Relative error desired in the sum of squares.
        xtol : float, optional
            Relative error desired in the approximate solution.
        gtol : float, optional
            Orthogonality desired between the function vector and the columns of
            the Jacobian.
        maxfev : int, optional
            The maximum number of calls to the function. If `Dfun` is provided
            then the default `maxfev` is 100*(N+1) where N is the number of elements
            in x0, otherwise the default `maxfev` is 200*(N+1).
        epsfcn : float, optional
            A variable used in determining a suitable step length for the forward-
            difference approximation of the Jacobian (for Dfun=None).
            Normally the actual step length will be sqrt(epsfcn)*x
            If epsfcn is less than the machine precision, it is assumed that the
            relative errors are of the order of the machine precision.
        factor : float, optional
            A parameter determining the initial step bound
            (``factor * || diag * x||``). Should be in interval ``(0.1, 100)``.
        diag : sequence, optional
            N positive entries that serve as a scale factors for the variables.
        
        Returns
        -------
        x : ndarray
            The solution (or the result of the last iteration for an unsuccessful
            call).
        cov_x : ndarray
            Uses the fjac and ipvt optional outputs to construct an
            estimate of the jacobian around the solution. None if a
            singular matrix encountered (indicates very flat curvature in
            some direction).  This matrix must be multiplied by the
            residual variance to get the covariance of the
            parameter estimates -- see curve_fit.
        infodict : dict
            a dictionary of optional outputs with the key s:
        
            ``nfev``
                The number of function calls
            ``fvec``
                The function evaluated at the output
            ``fjac``
                A permutation of the R matrix of a QR
                factorization of the final approximate
                Jacobian matrix, stored column wise.
                Together with ipvt, the covariance of the
                estimate can be approximated.
            ``ipvt``
                An integer array of length N which defines
                a permutation matrix, p, such that
                fjac*p = q*r, where r is upper triangular
                with diagonal elements of nonincreasing
                magnitude. Column j of p is column ipvt(j)
                of the identity matrix.
            ``qtf``
                The vector (transpose(q) * fvec).
        
        mesg : str
            A string message giving information about the cause of failure.
        ier : int
            An integer flag.  If it is equal to 1, 2, 3 or 4, the solution was
            found.  Otherwise, the solution was not found. In either case, the
            optional output variable 'mesg' gives more information.
        
        Notes
        -----
        "leastsq" is a wrapper around MINPACK's lmdif and lmder algorithms.
        
        cov_x is a Jacobian approximation to the Hessian of the least squares
        objective function.
        This approximation assumes that the objective function is based on the
        difference between some observed target data (ydata) and a (non-linear)
        function of the parameters `f(xdata, params)` ::
        
               func(params) = ydata - f(xdata, params)
        
        so that the objective function is ::
        
               min   sum((ydata - f(xdata, params))**2, axis=0)
             params
    
    line_search = line_search_wolfe2(f, myfprime, xk, pk, gfk=None, old_fval=None, old_old_fval=None, args=(), c1=0.0001, c2=0.9, amax=50)
        Find alpha that satisfies strong Wolfe conditions.
        
        Parameters
        ----------
        f : callable f(x,*args)
            Objective function.
        myfprime : callable f'(x,*args)
            Objective function gradient.
        xk : ndarray
            Starting point.
        pk : ndarray
            Search direction.
        gfk : ndarray, optional
            Gradient value for x=xk (xk being the current parameter
            estimate). Will be recomputed if omitted.
        old_fval : float, optional
            Function value for x=xk. Will be recomputed if omitted.
        old_old_fval : float, optional
            Function value for the point preceding x=xk
        args : tuple, optional
            Additional arguments passed to objective function.
        c1 : float, optional
            Parameter for Armijo condition rule.
        c2 : float, optional
            Parameter for curvature condition rule.
        amax : float, optional
            Maximum step size
        
        Returns
        -------
        alpha : float or None
            Alpha for which ``x_new = x0 + alpha * pk``,
            or None if the line search algorithm did not converge.
        fc : int
            Number of function evaluations made.
        gc : int
            Number of gradient evaluations made.
        new_fval : float or None
            New function value ``f(x_new)=f(x0+alpha*pk)``,
            or None if the line search algorithm did not converge.
        old_fval : float
            Old function value ``f(x0)``.
        new_slope : float or None
            The local slope along the search direction at the
            new value ``<myfprime(x_new), pk>``,
            or None if the line search algorithm did not converge.
        
        
        Notes
        -----
        Uses the line search algorithm to enforce strong Wolfe
        conditions.  See Wright and Nocedal, 'Numerical Optimization',
        1999, pg. 59-60.
        
        For the zoom phase it uses an algorithm by [...].
    
    linear_sum_assignment(cost_matrix)
        Solve the linear sum assignment problem.
        
        The linear sum assignment problem is also known as minimum weight matching
        in bipartite graphs. A problem instance is described by a matrix C, where
        each C[i,j] is the cost of matching vertex i of the first partite set
        (a "worker") and vertex j of the second set (a "job"). The goal is to find
        a complete assignment of workers to jobs of minimal cost.
        
        Formally, let X be a boolean matrix where :math:`X[i,j] = 1` iff row i is
        assigned to column j. Then the optimal assignment has cost
        
        .. math::
            \min \sum_i \sum_j C_{i,j} X_{i,j}
        
        s.t. each row is assignment to at most one column, and each column to at
        most one row.
        
        This function can also solve a generalization of the classic assignment
        problem where the cost matrix is rectangular. If it has more rows than
        columns, then not every row needs to be assigned to a column, and vice
        versa.
        
        The method used is the Hungarian algorithm, also known as the Munkres or
        Kuhn-Munkres algorithm.
        
        Parameters
        ----------
        cost_matrix : array
            The cost matrix of the bipartite graph.
        
        Returns
        -------
        row_ind, col_ind : array
            An array of row indices and one of corresponding column indices giving
            the optimal assignment. The cost of the assignment can be computed
            as ``cost_matrix[row_ind, col_ind].sum()``. The row indices will be
            sorted; in the case of a square cost matrix they will be equal to
            ``numpy.arange(cost_matrix.shape[0])``.
        
        Notes
        -----
        .. versionadded:: 0.17.0
        
        Examples
        --------
        >>> cost = np.array([[4, 1, 3], [2, 0, 5], [3, 2, 2]])
        >>> from scipy.optimize import linear_sum_assignment
        >>> row_ind, col_ind = linear_sum_assignment(cost)
        >>> col_ind
        array([1, 0, 2])
        >>> cost[row_ind, col_ind].sum()
        5
        
        References
        ----------
        1. http://csclab.murraystate.edu/bob.pilgrim/445/munkres.html
        
        2. Harold W. Kuhn. The Hungarian Method for the assignment problem.
           *Naval Research Logistics Quarterly*, 2:83-97, 1955.
        
        3. Harold W. Kuhn. Variants of the Hungarian method for assignment
           problems. *Naval Research Logistics Quarterly*, 3: 253-258, 1956.
        
        4. Munkres, J. Algorithms for the Assignment and Transportation Problems.
           *J. SIAM*, 5(1):32-38, March, 1957.
        
        5. https://en.wikipedia.org/wiki/Hungarian_algorithm
    
    linearmixing(F, xin, iter=None, alpha=None, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using a scalar Jacobian approximation.
        
        .. warning::
        
           This algorithm may be useful for specific problems, but whether
           it will work may depend strongly on the problem.
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        alpha : float, optional
            The Jacobian approximation is (-1/alpha).
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
    
    linprog(c, A_ub=None, b_ub=None, A_eq=None, b_eq=None, bounds=None, method='simplex', callback=None, options=None)
        Minimize a linear objective function subject to linear
        equality and inequality constraints.
        
        Linear Programming is intended to solve the following problem form:
        
        Minimize:     c^T * x
        
        Subject to:   A_ub * x <= b_ub
                      A_eq * x == b_eq
        
        Parameters
        ----------
        c : array_like
            Coefficients of the linear objective function to be minimized.
        A_ub : array_like, optional
            2-D array which, when matrix-multiplied by x, gives the values of the
            upper-bound inequality constraints at x.
        b_ub : array_like, optional
            1-D array of values representing the upper-bound of each inequality
            constraint (row) in A_ub.
        A_eq : array_like, optional
            2-D array which, when matrix-multiplied by x, gives the values of the
            equality constraints at x.
        b_eq : array_like, optional
            1-D array of values representing the RHS of each equality constraint
            (row) in A_eq.
        bounds : sequence, optional
            ``(min, max)`` pairs for each element in ``x``, defining
            the bounds on that parameter. Use None for one of ``min`` or
            ``max`` when there is no bound in that direction. By default
            bounds are ``(0, None)`` (non-negative)
            If a sequence containing a single tuple is provided, then ``min`` and
            ``max`` will be applied to all variables in the problem.
        method : str, optional
            Type of solver.  At this time only 'simplex' is supported
            :ref:`(see here) <optimize.linprog-simplex>`.
        callback : callable, optional
            If a callback function is provide, it will be called within each
            iteration of the simplex algorithm. The callback must have the signature
            `callback(xk, **kwargs)` where xk is the current solution vector
            and kwargs is a dictionary containing the following::
        
                "tableau" : The current Simplex algorithm tableau
                "nit" : The current iteration.
                "pivot" : The pivot (row, column) used for the next iteration.
                "phase" : Whether the algorithm is in Phase 1 or Phase 2.
                "basis" : The indices of the columns of the basic variables.
        
        options : dict, optional
            A dictionary of solver options. All methods accept the following
            generic options:
        
                maxiter : int
                    Maximum number of iterations to perform.
                disp : bool
                    Set to True to print convergence messages.
        
            For method-specific options, see `show_options('linprog')`.
        
        Returns
        -------
        A `scipy.optimize.OptimizeResult` consisting of the following fields:
        
            x : ndarray
                The independent variable vector which optimizes the linear
                programming problem.
            fun : float
                Value of the objective function.
            slack : ndarray
                The values of the slack variables.  Each slack variable corresponds
                to an inequality constraint.  If the slack is zero, then the
                corresponding constraint is active.
            success : bool
                Returns True if the algorithm succeeded in finding an optimal
                solution.
            status : int
                An integer representing the exit status of the optimization::
        
                     0 : Optimization terminated successfully
                     1 : Iteration limit reached
                     2 : Problem appears to be infeasible
                     3 : Problem appears to be unbounded
        
            nit : int
                The number of iterations performed.
            message : str
                A string descriptor of the exit status of the optimization.
        
        See Also
        --------
        show_options : Additional options accepted by the solvers
        
        Notes
        -----
        This section describes the available solvers that can be selected by the
        'method' parameter. The default method is :ref:`Simplex <optimize.linprog-simplex>`.
        
        Method *Simplex* uses the Simplex algorithm (as it relates to Linear
        Programming, NOT the Nelder-Mead Simplex) [1]_, [2]_. This algorithm
        should be reasonably reliable and fast.
        
        .. versionadded:: 0.15.0
        
        References
        ----------
        .. [1] Dantzig, George B., Linear programming and extensions. Rand
               Corporation Research Study Princeton Univ. Press, Princeton, NJ, 1963
        .. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
               Mathematical Programming", McGraw-Hill, Chapter 4.
        .. [3] Bland, Robert G. New finite pivoting rules for the simplex method.
               Mathematics of Operations Research (2), 1977: pp. 103-107.
        
        Examples
        --------
        Consider the following problem:
        
        Minimize: f = -1*x[0] + 4*x[1]
        
        Subject to: -3*x[0] + 1*x[1] <= 6
                     1*x[0] + 2*x[1] <= 4
                                x[1] >= -3
        
        where:  -inf <= x[0] <= inf
        
        This problem deviates from the standard linear programming problem.
        In standard form, linear programming problems assume the variables x are
        non-negative.  Since the variables don't have standard bounds where
        0 <= x <= inf, the bounds of the variables must be explicitly set.
        
        There are two upper-bound constraints, which can be expressed as
        
        dot(A_ub, x) <= b_ub
        
        The input for this problem is as follows:
        
        >>> c = [-1, 4]
        >>> A = [[-3, 1], [1, 2]]
        >>> b = [6, 4]
        >>> x0_bounds = (None, None)
        >>> x1_bounds = (-3, None)
        >>> from scipy.optimize import linprog
        >>> res = linprog(c, A_ub=A, b_ub=b, bounds=(x0_bounds, x1_bounds),
        ...               options={"disp": True})
        Optimization terminated successfully.
             Current function value: -22.000000
             Iterations: 1
        >>> print(res)
             fun: -22.0
         message: 'Optimization terminated successfully.'
             nit: 1
           slack: array([ 39.,   0.])
          status: 0
         success: True
               x: array([ 10.,  -3.])
        
        Note the actual objective value is 11.428571.  In this case we minimized
        the negative of the objective function.
    
    linprog_verbose_callback(xk, **kwargs)
        A sample callback function demonstrating the linprog callback interface.
        This callback produces detailed output to sys.stdout before each iteration
        and after the final iteration of the simplex algorithm.
        
        Parameters
        ----------
        xk : array_like
            The current solution vector.
        **kwargs : dict
            A dictionary containing the following parameters:
        
            tableau : array_like
                The current tableau of the simplex algorithm.
                Its structure is defined in _solve_simplex.
            phase : int
                The current Phase of the simplex algorithm (1 or 2)
            nit : int
                The current iteration number.
            pivot : tuple(int, int)
                The index of the tableau selected as the next pivot,
                or nan if no pivot exists
            basis : array(int)
                A list of the current basic variables.
                Each element contains the name of a basic variable and its value.
            complete : bool
                True if the simplex algorithm has completed
                (and this is the final call to callback), otherwise False.
    
    lsq_linear(A, b, bounds=(-inf, inf), method='trf', tol=1e-10, lsq_solver=None, lsmr_tol=None, max_iter=None, verbose=0)
        Solve a linear least-squares problem with bounds on the variables.
        
        Given a m-by-n design matrix A and a target vector b with m elements,
        `lsq_linear` solves the following optimization problem::
        
            minimize 0.5 * ||A x - b||**2
            subject to lb <= x <= ub
        
        This optimization problem is convex, hence a found minimum (if iterations
        have converged) is guaranteed to be global.
        
        Parameters
        ----------
        A : array_like, sparse matrix of LinearOperator, shape (m, n)
            Design matrix. Can be `scipy.sparse.linalg.LinearOperator`.
        b : array_like, shape (m,)
            Target vector.
        bounds : 2-tuple of array_like, optional
            Lower and upper bounds on independent variables. Defaults to no bounds.
            Each array must have shape (n,) or be a scalar, in the latter
            case a bound will be the same for all variables. Use ``np.inf`` with
            an appropriate sign to disable bounds on all or some variables.
        method : 'trf' or 'bvls', optional
            Method to perform minimization.
        
                * 'trf' : Trust Region Reflective algorithm adapted for a linear
                  least-squares problem. This is an interior-point-like method
                  and the required number of iterations is weakly correlated with
                  the number of variables.
                * 'bvls' : Bounded-Variable Least-Squares algorithm. This is
                  an active set method, which requires the number of iterations
                  comparable to the number of variables. Can't be used when `A` is
                  sparse or LinearOperator.
        
            Default is 'trf'.
        tol : float, optional
            Tolerance parameter. The algorithm terminates if a relative change
            of the cost function is less than `tol` on the last iteration.
            Additionally the first-order optimality measure is considered:
        
                * ``method='trf'`` terminates if the uniform norm of the gradient,
                  scaled to account for the presence of the bounds, is less than
                  `tol`.
                * ``method='bvls'`` terminates if Karush-Kuhn-Tucker conditions
                  are satisfied within `tol` tolerance.
        
        lsq_solver : {None, 'exact', 'lsmr'}, optional
            Method of solving unbounded least-squares problems throughout
            iterations:
        
                * 'exact' : Use dense QR or SVD decomposition approach. Can't be
                  used when `A` is sparse or LinearOperator.
                * 'lsmr' : Use `scipy.sparse.linalg.lsmr` iterative procedure
                  which requires only matrix-vector product evaluations. Can't
                  be used with ``method='bvls'``.
        
            If None (default) the solver is chosen based on type of `A`.
        lsmr_tol : None, float or 'auto', optional
            Tolerance parameters 'atol' and 'btol' for `scipy.sparse.linalg.lsmr`
            If None (default), it is set to ``1e-2 * tol``. If 'auto', the
            tolerance will be adjusted based on the optimality of the current
            iterate, which can speed up the optimization process, but is not always
            reliable.
        max_iter : None or int, optional
            Maximum number of iterations before termination. If None (default), it
            is set to 100 for ``method='trf'`` or to the number of variables for
            ``method='bvls'`` (not counting iterations for 'bvls' initialization).
        verbose : {0, 1, 2}, optional
            Level of algorithm's verbosity:
        
                * 0 : work silently (default).
                * 1 : display a termination report.
                * 2 : display progress during iterations.
        
        Returns
        -------
        OptimizeResult with the following fields defined:
        x : ndarray, shape (n,)
            Solution found.
        cost : float
            Value of the cost function at the solution.
        fun : ndarray, shape (m,)
            Vector of residuals at the solution.
        optimality : float
            First-order optimality measure. The exact meaning depends on `method`,
            refer to the description of `tol` parameter.
        active_mask : ndarray of int, shape (n,)
            Each component shows whether a corresponding constraint is active
            (that is, whether a variable is at the bound):
        
                *  0 : a constraint is not active.
                * -1 : a lower bound is active.
                *  1 : an upper bound is active.
        
            Might be somewhat arbitrary for the `trf` method as it generates a
            sequence of strictly feasible iterates and active_mask is determined
            within a tolerance threshold.
        nit : int
            Number of iterations. Zero if the unconstrained solution is optimal.
        status : int
            Reason for algorithm termination:
        
                * -1 : the algorithm was not able to make progress on the last
                  iteration.
                *  0 : the maximum number of iterations is exceeded.
                *  1 : the first-order optimality measure is less than `tol`.
                *  2 : the relative change of the cost function is less than `tol`.
                *  3 : the unconstrained solution is optimal.
        
        message : str
            Verbal description of the termination reason.
        success : bool
            True if one of the convergence criteria is satisfied (`status` > 0).
        
        See Also
        --------
        nnls : Linear least squares with non-negativity constraint.
        least_squares : Nonlinear least squares with bounds on the variables.                    
        
        Notes
        -----
        The algorithm first computes the unconstrained least-squares solution by
        `numpy.linalg.lstsq` or `scipy.sparse.linalg.lsmr` depending on
        `lsq_solver`. This solution is returned as optimal if it lies within the
        bounds.
        
        Method 'trf' runs the adaptation of the algorithm described in [STIR]_ for
        a linear least-squares problem. The iterations are essentially the same as
        in the nonlinear least-squares algorithm, but as the quadratic function
        model is always accurate, we don't need to track or modify the radius of
        a trust region. The line search (backtracking) is used as a safety net
        when a selected step does not decrease the cost function. Read more
        detailed description of the algorithm in `scipy.optimize.least_squares`.
        
        Method 'bvls' runs a Python implementation of the algorithm described in
        [BVLS]_. The algorithm maintains active and free sets of variables, on
        each iteration chooses a new variable to move from the active set to the
        free set and then solves the unconstrained least-squares problem on free
        variables. This algorithm is guaranteed to give an accurate solution
        eventually, but may require up to n iterations for a problem with n
        variables. Additionally, an ad-hoc initialization procedure is
        implemented, that determines which variables to set free or active
        initially. It takes some number of iterations before actual BVLS starts,
        but can significantly reduce the number of further iterations.
        
        References
        ----------
        .. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
                  and Conjugate Gradient Method for Large-Scale Bound-Constrained
                  Minimization Problems," SIAM Journal on Scientific Computing,
                  Vol. 21, Number 1, pp 1-23, 1999.
        .. [BVLS] P. B. Start and R. L. Parker, "Bounded-Variable Least-Squares:
                  an Algorithm and Applications", Computational Statistics, 10,
                  129-141, 1995.
        
        Examples
        --------
        In this example a problem with a large sparse matrix and bounds on the
        variables is solved.
        
        >>> from scipy.sparse import rand
        >>> from scipy.optimize import lsq_linear
        ...
        >>> np.random.seed(0)
        ...
        >>> m = 20000
        >>> n = 10000
        ...
        >>> A = rand(m, n, density=1e-4)
        >>> b = np.random.randn(m)
        ...
        >>> lb = np.random.randn(n)
        >>> ub = lb + 1
        ...
        >>> res = lsq_linear(A, b, bounds=(lb, ub), lsmr_tol='auto', verbose=1)
        # may vary
        The relative change of the cost function is less than `tol`.
        Number of iterations 16, initial cost 1.5039e+04, final cost 1.1112e+04,
        first-order optimality 4.66e-08.
    
    minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)
        Minimization of scalar function of one or more variables.
        
        In general, the optimization problems are of the form::
        
            minimize f(x) subject to
        
            g_i(x) >= 0,  i = 1,...,m
            h_j(x)  = 0,  j = 1,...,p
        
        where x is a vector of one or more variables.
        ``g_i(x)`` are the inequality constraints.
        ``h_j(x)`` are the equality constrains.
        
        Optionally, the lower and upper bounds for each element in x can also be
        specified using the `bounds` argument.
        
        Parameters
        ----------
        fun : callable
            Objective function.
        x0 : ndarray
            Initial guess.
        args : tuple, optional
            Extra arguments passed to the objective function and its
            derivatives (Jacobian, Hessian).
        method : str or callable, optional
            Type of solver.  Should be one of
        
                - 'Nelder-Mead' :ref:`(see here) <optimize.minimize-neldermead>`
                - 'Powell'      :ref:`(see here) <optimize.minimize-powell>`
                - 'CG'          :ref:`(see here) <optimize.minimize-cg>`
                - 'BFGS'        :ref:`(see here) <optimize.minimize-bfgs>`
                - 'Newton-CG'   :ref:`(see here) <optimize.minimize-newtoncg>`
                - 'L-BFGS-B'    :ref:`(see here) <optimize.minimize-lbfgsb>`
                - 'TNC'         :ref:`(see here) <optimize.minimize-tnc>`
                - 'COBYLA'      :ref:`(see here) <optimize.minimize-cobyla>`
                - 'SLSQP'       :ref:`(see here) <optimize.minimize-slsqp>`
                - 'dogleg'      :ref:`(see here) <optimize.minimize-dogleg>`
                - 'trust-ncg'   :ref:`(see here) <optimize.minimize-trustncg>`
                - custom - a callable object (added in version 0.14.0),
                  see below for description.
        
            If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,
            depending if the problem has constraints or bounds.
        jac : bool or callable, optional
            Jacobian (gradient) of objective function. Only for CG, BFGS,
            Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg.
            If `jac` is a Boolean and is True, `fun` is assumed to return the
            gradient along with the objective function. If False, the
            gradient will be estimated numerically.
            `jac` can also be a callable returning the gradient of the
            objective. In this case, it must accept the same arguments as `fun`.
        hess, hessp : callable, optional
            Hessian (matrix of second-order derivatives) of objective function or
            Hessian of objective function times an arbitrary vector p.  Only for
            Newton-CG, dogleg, trust-ncg.
            Only one of `hessp` or `hess` needs to be given.  If `hess` is
            provided, then `hessp` will be ignored.  If neither `hess` nor
            `hessp` is provided, then the Hessian product will be approximated
            using finite differences on `jac`. `hessp` must compute the Hessian
            times an arbitrary vector.
        bounds : sequence, optional
            Bounds for variables (only for L-BFGS-B, TNC and SLSQP).
            ``(min, max)`` pairs for each element in ``x``, defining
            the bounds on that parameter. Use None for one of ``min`` or
            ``max`` when there is no bound in that direction.
        constraints : dict or sequence of dict, optional
            Constraints definition (only for COBYLA and SLSQP).
            Each constraint is defined in a dictionary with fields:
        
                type : str
                    Constraint type: 'eq' for equality, 'ineq' for inequality.
                fun : callable
                    The function defining the constraint.
                jac : callable, optional
                    The Jacobian of `fun` (only for SLSQP).
                args : sequence, optional
                    Extra arguments to be passed to the function and Jacobian.
        
            Equality constraint means that the constraint function result is to
            be zero whereas inequality means that it is to be non-negative.
            Note that COBYLA only supports inequality constraints.
        tol : float, optional
            Tolerance for termination. For detailed control, use solver-specific
            options.
        options : dict, optional
            A dictionary of solver options. All methods accept the following
            generic options:
        
                maxiter : int
                    Maximum number of iterations to perform.
                disp : bool
                    Set to True to print convergence messages.
        
            For method-specific options, see :func:`show_options()`.
        callback : callable, optional
            Called after each iteration, as ``callback(xk)``, where ``xk`` is the
            current parameter vector.
        
        Returns
        -------
        res : OptimizeResult
            The optimization result represented as a ``OptimizeResult`` object.
            Important attributes are: ``x`` the solution array, ``success`` a
            Boolean flag indicating if the optimizer exited successfully and
            ``message`` which describes the cause of the termination. See
            `OptimizeResult` for a description of other attributes.
        
        
        See also
        --------
        minimize_scalar : Interface to minimization algorithms for scalar
            univariate functions
        show_options : Additional options accepted by the solvers
        
        Notes
        -----
        This section describes the available solvers that can be selected by the
        'method' parameter. The default method is *BFGS*.
        
        **Unconstrained minimization**
        
        Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the
        Simplex algorithm [1]_, [2]_. This algorithm is robust in many
        applications. However, if numerical computation of derivative can be
        trusted, other algorithms using the first and/or second derivatives
        information might be preferred for their better performance in
        general.
        
        Method :ref:`Powell <optimize.minimize-powell>` is a modification
        of Powell's method [3]_, [4]_ which is a conjugate direction
        method. It performs sequential one-dimensional minimizations along
        each vector of the directions set (`direc` field in `options` and
        `info`), which is updated at each iteration of the main
        minimization loop. The function need not be differentiable, and no
        derivatives are taken.
        
        Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate
        gradient algorithm by Polak and Ribiere, a variant of the
        Fletcher-Reeves method described in [5]_ pp.  120-122. Only the
        first derivatives are used.
        
        Method :ref:`BFGS <optimize.minimize-bfgs>` uses the quasi-Newton
        method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [5]_
        pp. 136. It uses the first derivatives only. BFGS has proven good
        performance even for non-smooth optimizations. This method also
        returns an approximation of the Hessian inverse, stored as
        `hess_inv` in the OptimizeResult object.
        
        Method :ref:`Newton-CG <optimize.minimize-newtoncg>` uses a
        Newton-CG algorithm [5]_ pp. 168 (also known as the truncated
        Newton method). It uses a CG method to the compute the search
        direction. See also *TNC* method for a box-constrained
        minimization with a similar algorithm.
        
        Method :ref:`dogleg <optimize.minimize-dogleg>` uses the dog-leg
        trust-region algorithm [5]_ for unconstrained minimization. This
        algorithm requires the gradient and Hessian; furthermore the
        Hessian is required to be positive definite.
        
        Method :ref:`trust-ncg <optimize.minimize-trustncg>` uses the
        Newton conjugate gradient trust-region algorithm [5]_ for
        unconstrained minimization. This algorithm requires the gradient
        and either the Hessian or a function that computes the product of
        the Hessian with a given vector.
        
        **Constrained minimization**
        
        Method :ref:`L-BFGS-B <optimize.minimize-lbfgsb>` uses the L-BFGS-B
        algorithm [6]_, [7]_ for bound constrained minimization.
        
        Method :ref:`TNC <optimize.minimize-tnc>` uses a truncated Newton
        algorithm [5]_, [8]_ to minimize a function with variables subject
        to bounds. This algorithm uses gradient information; it is also
        called Newton Conjugate-Gradient. It differs from the *Newton-CG*
        method described above as it wraps a C implementation and allows
        each variable to be given upper and lower bounds.
        
        Method :ref:`COBYLA <optimize.minimize-cobyla>` uses the
        Constrained Optimization BY Linear Approximation (COBYLA) method
        [9]_, [10]_, [11]_. The algorithm is based on linear
        approximations to the objective function and each constraint. The
        method wraps a FORTRAN implementation of the algorithm. The
        constraints functions 'fun' may return either a single number
        or an array or list of numbers.
        
        Method :ref:`SLSQP <optimize.minimize-slsqp>` uses Sequential
        Least SQuares Programming to minimize a function of several
        variables with any combination of bounds, equality and inequality
        constraints. The method wraps the SLSQP Optimization subroutine
        originally implemented by Dieter Kraft [12]_. Note that the
        wrapper handles infinite values in bounds by converting them into
        large floating values.
        
        **Custom minimizers**
        
        It may be useful to pass a custom minimization method, for example
        when using a frontend to this method such as `scipy.optimize.basinhopping`
        or a different library.  You can simply pass a callable as the ``method``
        parameter.
        
        The callable is called as ``method(fun, x0, args, **kwargs, **options)``
        where ``kwargs`` corresponds to any other parameters passed to `minimize`
        (such as `callback`, `hess`, etc.), except the `options` dict, which has
        its contents also passed as `method` parameters pair by pair.  Also, if
        `jac` has been passed as a bool type, `jac` and `fun` are mangled so that
        `fun` returns just the function values and `jac` is converted to a function
        returning the Jacobian.  The method shall return an ``OptimizeResult``
        object.
        
        The provided `method` callable must be able to accept (and possibly ignore)
        arbitrary parameters; the set of parameters accepted by `minimize` may
        expand in future versions and then these parameters will be passed to
        the method.  You can find an example in the scipy.optimize tutorial.
        
        .. versionadded:: 0.11.0
        
        References
        ----------
        .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function
            Minimization. The Computer Journal 7: 308-13.
        .. [2] Wright M H. 1996. Direct search methods: Once scorned, now
            respectable, in Numerical Analysis 1995: Proceedings of the 1995
            Dundee Biennial Conference in Numerical Analysis (Eds. D F
            Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.
            191-208.
        .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of
           a function of several variables without calculating derivatives. The
           Computer Journal 7: 155-162.
        .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.
           Numerical Recipes (any edition), Cambridge University Press.
        .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
           Springer New York.
        .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory
           Algorithm for Bound Constrained Optimization. SIAM Journal on
           Scientific and Statistical Computing 16 (5): 1190-1208.
        .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm
           778: L-BFGS-B, FORTRAN routines for large scale bound constrained
           optimization. ACM Transactions on Mathematical Software 23 (4):
           550-560.
        .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.
           1984. SIAM Journal of Numerical Analysis 21: 770-778.
        .. [9] Powell, M J D. A direct search optimization method that models
           the objective and constraint functions by linear interpolation.
           1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez
           and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.
        .. [10] Powell M J D. Direct search algorithms for optimization
           calculations. 1998. Acta Numerica 7: 287-336.
        .. [11] Powell M J D. A view of algorithms for optimization without
           derivatives. 2007.Cambridge University Technical Report DAMTP
           2007/NA03
        .. [12] Kraft, D. A software package for sequential quadratic
           programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace
           Center -- Institute for Flight Mechanics, Koln, Germany.
        
        Examples
        --------
        Let us consider the problem of minimizing the Rosenbrock function. This
        function (and its respective derivatives) is implemented in `rosen`
        (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.
        
        >>> from scipy.optimize import minimize, rosen, rosen_der
        
        A simple application of the *Nelder-Mead* method is:
        
        >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
        >>> res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)
        >>> res.x
        array([ 1.,  1.,  1.,  1.,  1.])
        
        Now using the *BFGS* algorithm, using the first derivative and a few
        options:
        
        >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,
        ...                options={'gtol': 1e-6, 'disp': True})
        Optimization terminated successfully.
                 Current function value: 0.000000
                 Iterations: 26
                 Function evaluations: 31
                 Gradient evaluations: 31
        >>> res.x
        array([ 1.,  1.,  1.,  1.,  1.])
        >>> print(res.message)
        Optimization terminated successfully.
        >>> res.hess_inv
        array([[ 0.00749589,  0.01255155,  0.02396251,  0.04750988,  0.09495377],  # may vary
               [ 0.01255155,  0.02510441,  0.04794055,  0.09502834,  0.18996269],
               [ 0.02396251,  0.04794055,  0.09631614,  0.19092151,  0.38165151],
               [ 0.04750988,  0.09502834,  0.19092151,  0.38341252,  0.7664427 ],
               [ 0.09495377,  0.18996269,  0.38165151,  0.7664427,   1.53713523]])
        
        
        Next, consider a minimization problem with several constraints (namely
        Example 16.4 from [5]_). The objective function is:
        
        >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2
        
        There are three constraints defined as:
        
        >>> cons = ({'type': 'ineq', 'fun': lambda x:  x[0] - 2 * x[1] + 2},
        ...         {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},
        ...         {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})
        
        And variables must be positive, hence the following bounds:
        
        >>> bnds = ((0, None), (0, None))
        
        The optimization problem is solved using the SLSQP method as:
        
        >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,
        ...                constraints=cons)
        
        It should converge to the theoretical solution (1.4 ,1.7).
    
    minimize_scalar(fun, bracket=None, bounds=None, args=(), method='brent', tol=None, options=None)
        Minimization of scalar function of one variable.
        
        Parameters
        ----------
        fun : callable
            Objective function.
            Scalar function, must return a scalar.
        bracket : sequence, optional
            For methods 'brent' and 'golden', `bracket` defines the bracketing
            interval and can either have three items `(a, b, c)` so that `a < b
            < c` and `fun(b) < fun(a), fun(c)` or two items `a` and `c` which
            are assumed to be a starting interval for a downhill bracket search
            (see `bracket`); it doesn't always mean that the obtained solution
            will satisfy `a <= x <= c`.
        bounds : sequence, optional
            For method 'bounded', `bounds` is mandatory and must have two items
            corresponding to the optimization bounds.
        args : tuple, optional
            Extra arguments passed to the objective function.
        method : str or callable, optional
            Type of solver.  Should be one of
        
                - 'Brent'     :ref:`(see here) <optimize.minimize_scalar-brent>`
                - 'Bounded'   :ref:`(see here) <optimize.minimize_scalar-bounded>`
                - 'Golden'    :ref:`(see here) <optimize.minimize_scalar-golden>`
                - custom - a callable object (added in version 0.14.0),
                  see below
        tol : float, optional
            Tolerance for termination. For detailed control, use solver-specific
            options.
        options : dict, optional
            A dictionary of solver options.
                maxiter : int
                    Maximum number of iterations to perform.
                disp : bool
                    Set to True to print convergence messages.
        
            See :func:`show_options()` for solver-specific options.
        
        Returns
        -------
        res : OptimizeResult
            The optimization result represented as a ``OptimizeResult`` object.
            Important attributes are: ``x`` the solution array, ``success`` a
            Boolean flag indicating if the optimizer exited successfully and
            ``message`` which describes the cause of the termination. See
            `OptimizeResult` for a description of other attributes.
        
        See also
        --------
        minimize : Interface to minimization algorithms for scalar multivariate
            functions
        show_options : Additional options accepted by the solvers
        
        Notes
        -----
        This section describes the available solvers that can be selected by the
        'method' parameter. The default method is *Brent*.
        
        Method :ref:`Brent <optimize.minimize_scalar-brent>` uses Brent's
        algorithm to find a local minimum.  The algorithm uses inverse
        parabolic interpolation when possible to speed up convergence of
        the golden section method.
        
        Method :ref:`Golden <optimize.minimize_scalar-golden>` uses the
        golden section search technique. It uses analog of the bisection
        method to decrease the bracketed interval. It is usually
        preferable to use the *Brent* method.
        
        Method :ref:`Bounded <optimize.minimize_scalar-bounded>` can
        perform bounded minimization. It uses the Brent method to find a
        local minimum in the interval x1 < xopt < x2.
        
        **Custom minimizers**
        
        It may be useful to pass a custom minimization method, for example
        when using some library frontend to minimize_scalar.  You can simply
        pass a callable as the ``method`` parameter.
        
        The callable is called as ``method(fun, args, **kwargs, **options)``
        where ``kwargs`` corresponds to any other parameters passed to `minimize`
        (such as `bracket`, `tol`, etc.), except the `options` dict, which has
        its contents also passed as `method` parameters pair by pair.  The method
        shall return an ``OptimizeResult`` object.
        
        The provided `method` callable must be able to accept (and possibly ignore)
        arbitrary parameters; the set of parameters accepted by `minimize` may
        expand in future versions and then these parameters will be passed to
        the method.  You can find an example in the scipy.optimize tutorial.
        
        .. versionadded:: 0.11.0
        
        Examples
        --------
        Consider the problem of minimizing the following function.
        
        >>> def f(x):
        ...     return (x - 2) * x * (x + 2)**2
        
        Using the *Brent* method, we find the local minimum as:
        
        >>> from scipy.optimize import minimize_scalar
        >>> res = minimize_scalar(f)
        >>> res.x
        1.28077640403
        
        Using the *Bounded* method, we find a local minimum with specified
        bounds as:
        
        >>> res = minimize_scalar(f, bounds=(-3, -1), method='bounded')
        >>> res.x
        -2.0000002026
    
    newton(func, x0, fprime=None, args=(), tol=1.48e-08, maxiter=50, fprime2=None)
        Find a zero using the Newton-Raphson or secant method.
        
        Find a zero of the function `func` given a nearby starting point `x0`.
        The Newton-Raphson method is used if the derivative `fprime` of `func`
        is provided, otherwise the secant method is used.  If the second order
        derivate `fprime2` of `func` is provided, parabolic Halley's method
        is used.
        
        Parameters
        ----------
        func : function
            The function whose zero is wanted. It must be a function of a
            single variable of the form f(x,a,b,c...), where a,b,c... are extra
            arguments that can be passed in the `args` parameter.
        x0 : float
            An initial estimate of the zero that should be somewhere near the
            actual zero.
        fprime : function, optional
            The derivative of the function when available and convenient. If it
            is None (default), then the secant method is used.
        args : tuple, optional
            Extra arguments to be used in the function call.
        tol : float, optional
            The allowable error of the zero value.
        maxiter : int, optional
            Maximum number of iterations.
        fprime2 : function, optional
            The second order derivative of the function when available and
            convenient. If it is None (default), then the normal Newton-Raphson
            or the secant method is used. If it is given, parabolic Halley's
            method is used.
        
        Returns
        -------
        zero : float
            Estimated location where function is zero.
        
        See Also
        --------
        brentq, brenth, ridder, bisect
        fsolve : find zeroes in n dimensions.
        
        Notes
        -----
        The convergence rate of the Newton-Raphson method is quadratic,
        the Halley method is cubic, and the secant method is
        sub-quadratic.  This means that if the function is well behaved
        the actual error in the estimated zero is approximately the square
        (cube for Halley) of the requested tolerance up to roundoff
        error. However, the stopping criterion used here is the step size
        and there is no guarantee that a zero has been found. Consequently
        the result should be verified. Safer algorithms are brentq,
        brenth, ridder, and bisect, but they all require that the root
        first be bracketed in an interval where the function changes
        sign. The brentq algorithm is recommended for general use in one
        dimensional problems when such an interval has been found.
    
    newton_krylov(F, xin, iter=None, rdiff=None, method='lgmres', inner_maxiter=20, inner_M=None, outer_k=10, verbose=False, maxiter=None, f_tol=None, f_rtol=None, x_tol=None, x_rtol=None, tol_norm=None, line_search='armijo', callback=None, **kw)
        Find a root of a function, using Krylov approximation for inverse Jacobian.
        
        This method is suitable for solving large-scale problems.
        
        Parameters
        ----------
        F : function(x) -> f
            Function whose root to find; should take and return an array-like
            object.
        x0 : array_like
            Initial guess for the solution
        rdiff : float, optional
            Relative step size to use in numerical differentiation.
        method : {'lgmres', 'gmres', 'bicgstab', 'cgs', 'minres'} or function
            Krylov method to use to approximate the Jacobian.
            Can be a string, or a function implementing the same interface as
            the iterative solvers in `scipy.sparse.linalg`.
        
            The default is `scipy.sparse.linalg.lgmres`.
        inner_M : LinearOperator or InverseJacobian
            Preconditioner for the inner Krylov iteration.
            Note that you can use also inverse Jacobians as (adaptive)
            preconditioners. For example,
        
            >>> from scipy.optimize.nonlin import BroydenFirst, KrylovJacobian
            >>> from scipy.optimize.nonlin import InverseJacobian
            >>> jac = BroydenFirst()
            >>> kjac = KrylovJacobian(inner_M=InverseJacobian(jac))
        
            If the preconditioner has a method named 'update', it will be called
            as ``update(x, f)`` after each nonlinear step, with ``x`` giving
            the current point, and ``f`` the current function value.
        inner_tol, inner_maxiter, ...
            Parameters to pass on to the \"inner\" Krylov solver.
            See `scipy.sparse.linalg.gmres` for details.
        outer_k : int, optional
            Size of the subspace kept across LGMRES nonlinear iterations.
            See `scipy.sparse.linalg.lgmres` for details.
        iter : int, optional
            Number of iterations to make. If omitted (default), make as many
            as required to meet tolerances.
        verbose : bool, optional
            Print status to stdout on every iteration.
        maxiter : int, optional
            Maximum number of iterations to make. If more are needed to
            meet convergence, `NoConvergence` is raised.
        f_tol : float, optional
            Absolute tolerance (in max-norm) for the residual.
            If omitted, default is 6e-6.
        f_rtol : float, optional
            Relative tolerance for the residual. If omitted, not used.
        x_tol : float, optional
            Absolute minimum step size, as determined from the Jacobian
            approximation. If the step size is smaller than this, optimization
            is terminated as successful. If omitted, not used.
        x_rtol : float, optional
            Relative minimum step size. If omitted, not used.
        tol_norm : function(vector) -> scalar, optional
            Norm to use in convergence check. Default is the maximum norm.
        line_search : {None, 'armijo' (default), 'wolfe'}, optional
            Which type of a line search to use to determine the step size in the
            direction given by the Jacobian approximation. Defaults to 'armijo'.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual.
        
        Returns
        -------
        sol : ndarray
            An array (of similar array type as `x0`) containing the final solution.
        
        Raises
        ------
        NoConvergence
            When a solution was not found.
        
        See Also
        --------
        scipy.sparse.linalg.gmres
        scipy.sparse.linalg.lgmres
        
        Notes
        -----
        This function implements a Newton-Krylov solver. The basic idea is
        to compute the inverse of the Jacobian with an iterative Krylov
        method. These methods require only evaluating the Jacobian-vector
        products, which are conveniently approximated by a finite difference:
        
        .. math:: J v \approx (f(x + \omega*v/|v|) - f(x)) / \omega
        
        Due to the use of iterative matrix inverses, these methods can
        deal with large nonlinear problems.
        
        Scipy's `scipy.sparse.linalg` module offers a selection of Krylov
        solvers to choose from. The default here is `lgmres`, which is a
        variant of restarted GMRES iteration that reuses some of the
        information obtained in the previous Newton steps to invert
        Jacobians in subsequent steps.
        
        For a review on Newton-Krylov methods, see for example [1]_,
        and for the LGMRES sparse inverse method, see [2]_.
        
        References
        ----------
        .. [1] D.A. Knoll and D.E. Keyes, J. Comp. Phys. 193, 357 (2004).
               doi:10.1016/j.jcp.2003.08.010
        .. [2] A.H. Baker and E.R. Jessup and T. Manteuffel,
               SIAM J. Matrix Anal. Appl. 26, 962 (2005).
               doi:10.1137/S0895479803422014
    
    nnls(A, b)
        Solve ``argmin_x || Ax - b ||_2`` for ``x>=0``. This is a wrapper
        for a FORTAN non-negative least squares solver.
        
        Parameters
        ----------
        A : ndarray
            Matrix ``A`` as shown above.
        b : ndarray
            Right-hand side vector.
        
        Returns
        -------
        x : ndarray
            Solution vector.
        rnorm : float
            The residual, ``|| Ax-b ||_2``.
        
        Notes
        -----
        The FORTRAN code was published in the book below. The algorithm
        is an active set method. It solves the KKT (Karush-Kuhn-Tucker)
        conditions for the non-negative least squares problem.
        
        References
        ----------
        Lawson C., Hanson R.J., (1987) Solving Least Squares Problems, SIAM
    
    ridder(f, a, b, args=(), xtol=2e-12, rtol=8.8817841970012523e-16, maxiter=100, full_output=False, disp=True)
        Find a root of a function in an interval.
        
        Parameters
        ----------
        f : function
            Python function returning a number.  f must be continuous, and f(a) and
            f(b) must have opposite signs.
        a : number
            One end of the bracketing interval [a,b].
        b : number
            The other end of the bracketing interval [a,b].
        xtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter must be nonnegative.
        rtol : number, optional
            The computed root ``x0`` will satisfy ``np.allclose(x, x0,
            atol=xtol, rtol=rtol)``, where ``x`` is the exact root. The
            parameter cannot be smaller than its default value of
            ``4*np.finfo(float).eps``.
        maxiter : number, optional
            if convergence is not achieved in maxiter iterations, an error is
            raised.  Must be >= 0.
        args : tuple, optional
            containing extra arguments for the function `f`.
            `f` is called by ``apply(f, (x)+args)``.
        full_output : bool, optional
            If `full_output` is False, the root is returned.  If `full_output` is
            True, the return value is ``(x, r)``, where `x` is the root, and `r` is
            a RootResults object.
        disp : bool, optional
            If True, raise RuntimeError if the algorithm didn't converge.
        
        Returns
        -------
        x0 : float
            Zero of `f` between `a` and `b`.
        r : RootResults (present if ``full_output = True``)
            Object containing information about the convergence.
            In particular, ``r.converged`` is True if the routine converged.
        
        See Also
        --------
        brentq, brenth, bisect, newton : one-dimensional root-finding
        fixed_point : scalar fixed-point finder
        
        Notes
        -----
        Uses [Ridders1979]_ method to find a zero of the function `f` between the
        arguments `a` and `b`. Ridders' method is faster than bisection, but not
        generally as fast as the Brent rountines. [Ridders1979]_ provides the
        classic description and source of the algorithm. A description can also be
        found in any recent edition of Numerical Recipes.
        
        The routine used here diverges slightly from standard presentations in
        order to be a bit more careful of tolerance.
        
        References
        ----------
        .. [Ridders1979]
           Ridders, C. F. J. "A New Algorithm for Computing a
           Single Root of a Real Continuous Function."
           IEEE Trans. Circuits Systems 26, 979-980, 1979.
    
    root(fun, x0, args=(), method='hybr', jac=None, tol=None, callback=None, options=None)
        Find a root of a vector function.
        
        Parameters
        ----------
        fun : callable
            A vector function to find a root of.
        x0 : ndarray
            Initial guess.
        args : tuple, optional
            Extra arguments passed to the objective function and its Jacobian.
        method : str, optional
            Type of solver.  Should be one of
        
                - 'hybr'             :ref:`(see here) <optimize.root-hybr>`
                - 'lm'               :ref:`(see here) <optimize.root-lm>`
                - 'broyden1'         :ref:`(see here) <optimize.root-broyden1>`
                - 'broyden2'         :ref:`(see here) <optimize.root-broyden2>`
                - 'anderson'         :ref:`(see here) <optimize.root-anderson>`
                - 'linearmixing'     :ref:`(see here) <optimize.root-linearmixing>`
                - 'diagbroyden'      :ref:`(see here) <optimize.root-diagbroyden>`
                - 'excitingmixing'   :ref:`(see here) <optimize.root-excitingmixing>`
                - 'krylov'           :ref:`(see here) <optimize.root-krylov>`
                - 'df-sane'          :ref:`(see here) <optimize.root-dfsane>`
        
        jac : bool or callable, optional
            If `jac` is a Boolean and is True, `fun` is assumed to return the
            value of Jacobian along with the objective function. If False, the
            Jacobian will be estimated numerically.
            `jac` can also be a callable returning the Jacobian of `fun`. In
            this case, it must accept the same arguments as `fun`.
        tol : float, optional
            Tolerance for termination. For detailed control, use solver-specific
            options.
        callback : function, optional
            Optional callback function. It is called on every iteration as
            ``callback(x, f)`` where `x` is the current solution and `f`
            the corresponding residual. For all methods but 'hybr' and 'lm'.
        options : dict, optional
            A dictionary of solver options. E.g. `xtol` or `maxiter`, see
            :obj:`show_options()` for details.
        
        Returns
        -------
        sol : OptimizeResult
            The solution represented as a ``OptimizeResult`` object.
            Important attributes are: ``x`` the solution array, ``success`` a
            Boolean flag indicating if the algorithm exited successfully and
            ``message`` which describes the cause of the termination. See
            `OptimizeResult` for a description of other attributes.
        
        See also
        --------
        show_options : Additional options accepted by the solvers
        
        Notes
        -----
        This section describes the available solvers that can be selected by the
        'method' parameter. The default method is *hybr*.
        
        Method *hybr* uses a modification of the Powell hybrid method as
        implemented in MINPACK [1]_.
        
        Method *lm* solves the system of nonlinear equations in a least squares
        sense using a modification of the Levenberg-Marquardt algorithm as
        implemented in MINPACK [1]_.
        
        Method *df-sane* is a derivative-free spectral method. [3]_
        
        Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
        *diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
        with backtracking or full line searches [2]_. Each method corresponds
        to a particular Jacobian approximations. See `nonlin` for details.
        
        - Method *broyden1* uses Broyden's first Jacobian approximation, it is
          known as Broyden's good method.
        - Method *broyden2* uses Broyden's second Jacobian approximation, it
          is known as Broyden's bad method.
        - Method *anderson* uses (extended) Anderson mixing.
        - Method *Krylov* uses Krylov approximation for inverse Jacobian. It
          is suitable for large-scale problem.
        - Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
        - Method *linearmixing* uses a scalar Jacobian approximation.
        - Method *excitingmixing* uses a tuned diagonal Jacobian
          approximation.
        
        .. warning::
        
            The algorithms implemented for methods *diagbroyden*,
            *linearmixing* and *excitingmixing* may be useful for specific
            problems, but whether they will work may depend strongly on the
            problem.
        
        .. versionadded:: 0.11.0
        
        References
        ----------
        .. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
           1980. User Guide for MINPACK-1.
        .. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
            Equations. Society for Industrial and Applied Mathematics.
            <http://www.siam.org/books/kelley/>
        .. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).
        
        Examples
        --------
        The following functions define a system of nonlinear equations and its
        jacobian.
        
        >>> def fun(x):
        ...     return [x[0]  + 0.5 * (x[0] - x[1])**3 - 1.0,
        ...             0.5 * (x[1] - x[0])**3 + x[1]]
        
        >>> def jac(x):
        ...     return np.array([[1 + 1.5 * (x[0] - x[1])**2,
        ...                       -1.5 * (x[0] - x[1])**2],
        ...                      [-1.5 * (x[1] - x[0])**2,
        ...                       1 + 1.5 * (x[1] - x[0])**2]])
        
        A solution can be obtained as follows.
        
        >>> from scipy import optimize
        >>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
        >>> sol.x
        array([ 0.8411639,  0.1588361])
    
    rosen(x)
        The Rosenbrock function.
        
        The function computed is::
        
            sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
        
        Parameters
        ----------
        x : array_like
            1-D array of points at which the Rosenbrock function is to be computed.
        
        Returns
        -------
        f : float
            The value of the Rosenbrock function.
        
        See Also
        --------
        rosen_der, rosen_hess, rosen_hess_prod
    
    rosen_der(x)
        The derivative (i.e. gradient) of the Rosenbrock function.
        
        Parameters
        ----------
        x : array_like
            1-D array of points at which the derivative is to be computed.
        
        Returns
        -------
        rosen_der : (N,) ndarray
            The gradient of the Rosenbrock function at `x`.
        
        See Also
        --------
        rosen, rosen_hess, rosen_hess_prod
    
    rosen_hess(x)
        The Hessian matrix of the Rosenbrock function.
        
        Parameters
        ----------
        x : array_like
            1-D array of points at which the Hessian matrix is to be computed.
        
        Returns
        -------
        rosen_hess : ndarray
            The Hessian matrix of the Rosenbrock function at `x`.
        
        See Also
        --------
        rosen, rosen_der, rosen_hess_prod
    
    rosen_hess_prod(x, p)
        Product of the Hessian matrix of the Rosenbrock function with a vector.
        
        Parameters
        ----------
        x : array_like
            1-D array of points at which the Hessian matrix is to be computed.
        p : array_like
            1-D array, the vector to be multiplied by the Hessian matrix.
        
        Returns
        -------
        rosen_hess_prod : ndarray
            The Hessian matrix of the Rosenbrock function at `x` multiplied
            by the vector `p`.
        
        See Also
        --------
        rosen, rosen_der, rosen_hess
    
    show_options(solver=None, method=None, disp=True)
        Show documentation for additional options of optimization solvers.
        
        These are method-specific options that can be supplied through the
        ``options`` dict.
        
        Parameters
        ----------
        solver : str
            Type of optimization solver. One of 'minimize', 'minimize_scalar',
            'root', or 'linprog'.
        method : str, optional
            If not given, shows all methods of the specified solver. Otherwise,
            show only the options for the specified method. Valid values
            corresponds to methods' names of respective solver (e.g. 'BFGS' for
            'minimize').
        disp : bool, optional
            Whether to print the result rather than returning it.
        
        Returns
        -------
        text
            Either None (for disp=False) or the text string (disp=True)
        
        Notes
        -----
        The solver-specific methods are:
        
        `scipy.optimize.minimize`
        
        - :ref:`Nelder-Mead <optimize.minimize-neldermead>`
        - :ref:`Powell      <optimize.minimize-powell>`
        - :ref:`CG          <optimize.minimize-cg>`
        - :ref:`BFGS        <optimize.minimize-bfgs>`
        - :ref:`Newton-CG   <optimize.minimize-newtoncg>`
        - :ref:`L-BFGS-B    <optimize.minimize-lbfgsb>`
        - :ref:`TNC         <optimize.minimize-tnc>`
        - :ref:`COBYLA      <optimize.minimize-cobyla>`
        - :ref:`SLSQP       <optimize.minimize-slsqp>`
        - :ref:`dogleg      <optimize.minimize-dogleg>`
        - :ref:`trust-ncg   <optimize.minimize-trustncg>`
        
        `scipy.optimize.root`
        
        - :ref:`hybr              <optimize.root-hybr>`
        - :ref:`lm                <optimize.root-lm>`
        - :ref:`broyden1          <optimize.root-broyden1>`
        - :ref:`broyden2          <optimize.root-broyden2>`
        - :ref:`anderson          <optimize.root-anderson>`
        - :ref:`linearmixing      <optimize.root-linearmixing>`
        - :ref:`diagbroyden       <optimize.root-diagbroyden>`
        - :ref:`excitingmixing    <optimize.root-excitingmixing>`
        - :ref:`krylov            <optimize.root-krylov>`
        - :ref:`df-sane           <optimize.root-dfsane>`
        
        `scipy.optimize.minimize_scalar`
        
        - :ref:`brent       <optimize.minimize_scalar-brent>`
        - :ref:`golden      <optimize.minimize_scalar-golden>`
        - :ref:`bounded     <optimize.minimize_scalar-bounded>`
        
        `scipy.optimize.linprog`
        
        - :ref:`simplex     <optimize.linprog-simplex>`

DATA
    __all__ = ['LbfgsInvHessProduct', 'OptimizeResult', 'OptimizeWarning',...
    absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...
    division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192...
    print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...


For Machine Learning, we are mainly interested in unconstrained minimization of multivariate scalar functions (typically where gradient information is available). In addition to several algorithms for unconstrained minimization of multivariate scalar functions (e.g. BFGS, Nelder-Mead simplex, Newton Conjugate Gradient, etc.) the module also contains:

  • Global (brute-force) optimization routines
  • Least-squares minimization (which we saw before in the Linear Algebra Notebook)
  • Scalar univariate function minimizers and root finders; and
  • Multivariate equation system solvers using a variety of algorithms

Unconstrained minimization of multivariate scalar functions (minimize)

The minimize function provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions. To demonstrate the minimization function, let's consider the problem of minimizing the Rosenbrock function of $N$ variables: $$ f\left(\mathbf{x}\right)=\sum_{i=1}^{N-1}100\left(x_{i}-x_{i-1}^{2}\right)^{2}+\left(1-x_{i-1}\right)^{2}.$$

The minimum value of this function is 0 which is achieved when $x_i=1$.

Note that the Rosenbrock function and its derivatives are included in scipy.optimize. The implementations in the following provide examples of how to define an objective function as well as its Jacobian and Hessian functions.

Nelder-Mead Simplex algorithm (method='Nelder-Mead')

In the example below, the minimize routine is used with the Nelder-Mead simplex algorithm (selected through the method parameter):


In [4]:
import numpy as np
from scipy.optimize import minimize

In [5]:
def rosen(x):
    """The Rosenbrock function"""
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

In [6]:
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead',
               options={'xtol': 1e-8, 'disp': True})
print(res.x)


Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 339
         Function evaluations: 571
[ 1.  1.  1.  1.  1.]

The simplex method is a simple way to minimize a fairly well-behaved function. It only requires function evaluations and is a good choice for simple minimization problems. However, because it does not use any gradient evaluations, it may take longer to find the minimum.

Broyden-Fletcher-Golfarb-Shanno algorithm (method='BFGS')

In order to converge more quickly to the solution, this routine uses the gradient of the objective function. If the gradient is not given by the user, then it is estimated using first-differences. The Broyden-Fletcher-Golfarb-Shanno (BFGS) method typically requires fewer calls than the simplex algorithm even when the gradient must be estimated.

To demonstrate this algorithm, the Rosenbrock function is used again. The gradient of the Rosenbrock function is the vector:

$$ \begin{eqnarray*} \frac{\partial f}{\partial x_{j}} & = & \sum_{i=1}^{N}200\left(x_{i}-x_{i-1}^{2}\right)\left(\delta_{i,j}-2x_{i-1}\delta_{i-1,j}\right)-2\left(1-x_{i-1}\right)\delta_{i-1,j}.\\ & = & 200\left(x_{j}-x_{j-1}^{2}\right)-400x_{j}\left(x_{j+1}-x_{j}^{2}\right)-2\left(1-x_{j}\right).\end{eqnarray*}$$

This expression is vaalid for the interior derivatives. Special cases are:

$$ \begin{eqnarray*} \frac{\partial f}{\partial x_{0}} & = & -400x_{0}\left(x_{1}-x_{0}^{2}\right)-2\left(1-x_{0}\right),\\ \frac{\partial f}{\partial x_{N-1}} & = & 200\left(x_{N-1}-x_{N-2}^{2}\right).\end{eqnarray*} $$

A function which computes this gradient is:


In [7]:
# note the special handling of the exterior derivatives
def rosen_der(x):
    xm = x[1:-1]
    xm_m1 = x[:-2]
    xm_p1 = x[2:]
    der = np.zeros_like(x)
    der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm)
    der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0])
    der[-1] = 200*(x[-1]-x[-2]**2)
    return der

This gradient information is specified in the minimize function through the jac parameter:


In [8]:
res = minimize(rosen, x0, method='BFGS', jac=rosen_der,
               options={'disp': True})
print res.x


Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 25
         Function evaluations: 30
         Gradient evaluations: 30
[ 1.00000004  1.0000001   1.00000021  1.00000044  1.00000092]

Machine learning libraries (e.g. Tensorflow, Theano, Torch etc.) will provide a similar interface. When they provide auto-differentiation capabilities, you will not need to worry about writing the derivative function yourself. You will need to provide the "forward" computational graph and an objective.

Black-box function optimization with skopt

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization.

Alternative libraries include Spearmint, PyBO, and Hyperopt.

Black-box algorithms do not need any knowledge of the gradient. These libraries provide algorithms that are more powerful and scale better than the Nelder-Mead simplex algorithm above. Modern black-box (or sequential model-based) optimization algorithms are increasingly popular for optimizing the hyperparameters (user-tuned "knobs") of machine learning models. We'll talk more about this later.

For now, just a brief example, which is taken from the skopt Bayesian Optimization tutorial:


In [9]:
import numpy as np
from skopt import gp_minimize

Let's assume the following noisy function $f$:


In [10]:
noise_level = 0.1

def f(x, noise_level=noise_level):
    return np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) + np.random.randn() * noise_level

In skopt, functions $f$ are assumed to take as input a 1D vector $x$ represented as an array-like and to return a scalar $f(x)$:


In [11]:
# Plot f(x) + contours
x = np.linspace(-2, 2, 400).reshape(-1, 1)
fx = [f(x_i, noise_level=0.0) for x_i in x]
plt.plot(x, fx, "r--", label="True (unknown)")
plt.fill(np.concatenate([x, x[::-1]]),
         np.concatenate(([fx_i - 1.9600 * noise_level for fx_i in fx], 
                         [fx_i + 1.9600 * noise_level for fx_i in fx[::-1]])),
         alpha=.2, fc="r", ec="None")
plt.legend()
plt.grid()


Bayesian Optimization based on Gaussian Process regression is implemented in skopt.gp_minimize and can be carried out as follows:


In [13]:
res = gp_minimize(f,                  # the function to minimize
                  [(-2.0, 2.0)],      # the bounds on each dimension of x
                  acq_func="EI",      # the acquisition function
                  n_calls=15,         # the number of evaluations of f 
                  n_random_starts=5,  # the number of random initialization points
                  noise=0.1**2,       # the noise level (optional)
                  random_state=123)   # the random seed

Accordingly, the approximated minimum is found to be:


In [14]:
print "x^*=%.4f, f(x^*)=%.4f" % (res.x[0], res.fun)


x^*=-0.2939, f(x^*)=-0.9113

For further inspection of the results, attributes of the res named tuple provide the following information:

  • x [float]: location of the minimum.
  • fun [float]: function value at the minimum.
  • models: surrogate models used for each iteration.
  • x_iters [array]: location of function evaluation for each iteration.
  • func_vals [array]: function value for each iteration.
  • space [Space]: the optimization space.
  • specs [dict]: parameters passed to the function.

In [23]:
print(res)


          fun: -0.80273097245820868
    func_vals: array([-0.58595405,  0.24648768,  0.11057704,  0.92336104, -0.3223665 ,
       -0.01593612, -0.38093039, -0.15015809, -0.04993988, -0.00714723,
        0.01772646, -0.02936562, -0.80273097, -0.79999509, -0.59693458])
       models: [GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x118d92690>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x10f9e0b90>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x10f9e0780>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x10f9e0af0>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x10f9e0460>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x118d13460>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x118d13d20>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x118d13410>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x118d13c80>), GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=0.01),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x11bd0d870>)]
 random_state: <mtrand.RandomState object at 0x10f922c30>
        space: Space([Real(low=-2.0, high=2.0, prior=uniform, transform=identity)])
        specs: {'function': 'base_minimize', 'args': {'base_estimator': GaussianProcessRegressor(alpha=0.0, copy_X_train=True,
             kernel=1**2 * Matern(length_scale=1, nu=2.5),
             n_restarts_optimizer=2, noise=0.01, normalize_y=True,
             optimizer='fmin_l_bfgs_b',
             random_state=<mtrand.RandomState object at 0x10f933460>), 'xi': 0.01, 'verbose': False, 'n_points': 10000, 'acq_func': 'EI', 'n_jobs': 1, 'func': <function f at 0x117a9b9b0>, 'n_restarts_optimizer': 5, 'y0': None, 'x0': None, 'kappa': 1.96, 'dimensions': [(-2.0, 2.0)], 'n_calls': 15, 'acq_optimizer': 'lbfgs', 'callback': None, 'random_state': 123, 'n_random_starts': 5}}
            x: [-0.35694104829499329]
      x_iters: [[0.78587674239144656], [-0.85544266019848214], [-1.0925941857431876], [0.20525907633156493], [0.87787587914225229], [0.68614176817095951], [0.81071157887355993], [1.0878721001533365], [1.3414489647511012], [1.6962814860628863], [-1.9995406233569253], [-1.5753814439271561], [-0.35694104829499329], [-0.31300143998063323], [-0.42550176300353354]]

Together these attributes can be used to visually inspect the results of the minimization, such as the convergence trace or the acquisition function at the last iteration:


In [24]:
from skopt.plots import plot_convergence
plot_convergence(res);